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REMARKS 

Claims 21-40 are pending in the application. Applicants reserve the right to prosecute non- 
elected subject matter in subsequent divisional applications. 

Comments Regarding Restriction Requirement 

Applicants affirm the election with traverse of Group H, which corresponds to newly added 
claims 23-31 drawn to a polynucleotide, vector, host cell, and method for producing a polypeptide. 
Newly added claims 23-31 replace original claims 3-6 and 9-14, and are drawn to substantially the 
same invention, but are of a different scope. 

Applicants respectfully submit that there is minimal additional burden on the Examiner to 
examine newly added claims 39 and 40, which are drawn to microarrays using the elected 
polynucleotides. 

Applicants request that the Examiner withdraw the Restriction Requirement at least with respect 
to claims 21, 22, 35, and 36 of Group I, and examine those claims together with the elected 
polynucleotide claims of Group H. 

The rules under MPEP section 1893.03(d) require the Examiner to apply the Unity of Invention 

standard PCT Rule 13.2 instead of U.S. restriction/election of species practice in national stage 

applications, such as the instant application filed under 35 U.S.C. 371. Applicants believe unity of 

invention exists for claims drawn to the polypeptide sequence of SEQ ID N0:1 (Le., claims 21, 22, 35, 

and 36) and claims drawn to the elected polynucleotide sequence of SEQ ID NO:2 which encodes 

SEQ ID N0:1 {i.e., claims 23-31) based on the rules concerning unity of invention under the Patent 

Cooperation Treaty. The Administrative Instructions Under The Patent Cooperation Treaty, Annex B, 

Unity of Invention, Part 2, "Examples Concerning Unity of Invention" provide the following guidelines 

with regard to unity of invention between a protein and the polynucleotide that encodes it: 

Example 1 7 
Claim 1: Protein X. 

Claim 2: DNA sequence encoding protein X. 

Expression of the DNA sequence in a host results in the production of a protein which is 
determined by the DNA sequence. The protein and the DNA sequence exhibit corresponding 
special technical features. Unity between claims 1 and 2 is accepted. 

109025 6 09/830,914 



DSPket No.: PF-0621 USN 



As currently pending, the claims of Group n drawn to polynucleotides and the claims of Group 
I drawn to polypeptides do not encompass prior art, and the "objection of lack of unity" based on the 
reference of Kinkema et al. (Accession Q39157) no longer applies. Therefore, Applicants request that 
the Examiner withdraw the Restriction Requirement, at least with respect to claims 21, 22, 35, and 36 
of Group I, and examine those claims together with the elected polynucleotide claims of Group n. 

Rejoinder of method claims upon allowance of product claims under U.S. practice 
The Examiner is reminded that claims 32-34 and 38, drawn to methods of using the elected 
polynucleotides of Group n should be rejoined per the Conmiissioner's Notice in the Official Gazette 
of March 26, 1996, entitled "Guidance on Treatment of Product and Process Claims in light of In re 
Ochiai, In re Brouwer and 35 U.S.C. § 103(b)" which sets forth the rules, upon allowance of product 
claims, for rejoinder of process claims covering the same scope of products. Applicants request that 
claims 32-34 and 38 be rejoined and examined upon allowance of the claims drawn to the 
polynucleotides of Group n. 

Ob jections to the claims 

Original claims 3-6 were objected to because of their dependence from original claim 1. New 
claims 23-26 similarly depend from claim 21, drawn to nonelected polypeptides. As mentioned above. 
Applicants believe that the claims drawn to the polypeptides of the invention, according to the unity of 
invention standard, should be examined with the elected claims drawn to the polynucleotides currently 
under examination. Applicants request reconsideration and believe amending these claims at this time 
would be premature. 

Original claims 4 and 10 were objected to as being in improper dependent form. These claims 
have now been replaced by new claims 23 and 30, which are believed to be in proper form. 
Withdrawal of the objections is therefore respectfully requested. 
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Utility Rejections under 35 U.S.C. §101 and §112. First Paragraph 

Original claims 3-6 and 9-14, now replaced by new claims 23-31, have been rejected under 
35 U.S.C. §101 and §1 12, first paragraph, because the claimed invention allegedly "is not supported 
by either a credible asserted utility or a well-established utility" (Office Action, page 3). These 
rejections are traversed. 

The rejection of claims 23-31 is improper, as the inventions of those claims have a 
patentable utility as set forth in the instant specification, and/or a utility well known to one of 
ordinary skill in the art. 

The invention at issue is a polynucleotide sequence corresponding to a gene that is expressed in 
hematopoietic/immune system, gastrointestinal, musculoskeletal, and reproductive tissues, and in tissues 
associated with cancer (Specification at page 18, lines 12-17). In particular, similarities between SEQ 
ED N0:1 and C. elegans myosin (gl279777) and H, annuus unconventional myosin (g2444174), 
including the presence of myosin head domain, myosin heavy chain, and light chain binding site 
signatures, are described in the specification, for example, at page 17, line 30 through page 18, line 9. 
The specification points out the roles of myosin in muscle contraction, intracellular movement, 
phagocytosis, and cytokinesis, and describes various diseases associated with myosin dysfunction, 
including muscle disorders, cardiovascular disease, deafness, and cancer (Specification at pages 1-2). 
As such, the claimed invention has numerous practical, beneficial uses in toxicology testing, drug 
development, and the diagnosis of disease, none of which requires knowledge of how the polypeptide 
coded for by the polynucleotide actually functions. 

Applicants submit with this paper the Declaration of Dr. Tod Bedilion* describing some of the 
practical uses of the claimed invention in gene and protein expression monitoring applications. The 
Bedilion Declaration demonstrates that the positions and arguments made by the Patent Examiner with 
respect to the utility of the claimed polynucleotide are without merit. 



*The Bedilion Declaration is submitted herewith in unexecuted form. The executed Declaration 
will be submitted to the Patent office as soon it is available. 
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The Bedilion Declaration describes, in particular, how the claimed expressed polynucleotide 

can be used in gene expression monitoring applications that were well-known at the time the patent 

application was filed, and how those applications are useful in developing drugs and monitoring their 

activity. Dr. Bedilion states that the claimed invention is a useful tool when employed as a highly 

specific probe in a cDNA microarray: 

Persons skilled in the art would appreciate that cDNA microarrays that contained the SEQ ID 
NO:l-encoding polynucleotides would be a more useful tool than cDNA microarrays that did 
not contain the polynucleotides in connection with conducting gene expression monitoring 
studies on proposed (or actual) drugs for treating heart and skeletal muscle disorders, 
developmental disorders, and cell proliferative disorders, including cancer for such purposes as 
evaluating their efficacy and toxicity. 

The Patent Examiner does not dispute that the claimed polynucleotide can be used as a probe 
in cDNA microarrays and used in gene expression monitoring applications. Instead, the Patent 
Examiner contends that the claimed polynucleotide cannot be useful without precise knowledge of its 
biological function. But the law never has required knowledge of biological function to prove utility. It 
is the claimed invention's uses, not its functions, that are the subject of a proper analysis under the utility 
requirement. 

In any event, as demonstrated by the Bedilion Declaration, the person of ordinary skill in the art 
can achieve beneficial results from the claimed polynucleotide in the absence of any knowledge as to 
the precise function of the protein encoded by it. The uses of the claimed polynucleotide in gene 
expression monitoring applications are in fact independent of its precise function. 

I. The Applicable Legal Standard 

To meet the utility requirement of sections 101 and 1 12 of the Patent Act, the patent applicant 
need only show that the claimed invention is "practically useful," Anderson v. Natta, 480 F.2d 1392, 
1397, 178 USPQ 458 (CCPA 1973) and confers a "specific benefit" on the public. Brenner v. 
Manson, 383 U.S. 519, 534-35, 148 USPQ 689 (1966). As discussed in a recent Court of Appeals 
for the Federal Circuit case, this threshold is not high: 
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An invention is "useful" under section 101 if it is capable of providing some identifiable benefit. 
See Brenner v. Manson, 383 U.S. 519, 534 [148 USPQ 689] (1966); Brooktree Corp, v. 
AdvancedMicro Devices, Inc., 977 R2d 1555, 1571 [24 USPQ2d 1401] (Fed. Cir. 1992) 
("to violate Section 101 the claimed device must be totally incapable of achieving a useful 
result"); Fuller v. Berger, 120 F. 274, 275 (7th Cir. 1903) (test for utility is whether invention 
"is incapable of serving any beneficial end"). 

Juicy Whip Inc, v. Orange Bang Inc., 51 USPQ2d 1700 (Fed. Cir. 1999). 

While an asserted utility must be described with specificity, the patent applicant need not 

demonstrate utility to a certainty. In Stiftung v. Renishaw PLC, 945 F.2d 1 173, 1 180, 20 USPQ2d 

1094 (Fed. Cir. 1991), the United States Court of Appeals for the Federal Circuit explained: 

An invention need not be the best or only way to accomplish a certain result, and it need only 
be useful to some extent and in certain applications: "[T]he fact that an invention has only limited 
utility and is only operable in certain applications is not grounds for finding lack of utility." 
Envirotech Corp, v. Al George, Inc, 730 F.2d 753, 762, 221 USPQ 473, 480 (Fed. Cir. 
1984). 

The specificity requirement is not, therefore, an onerous one. If the asserted utility is described 
so that a person of ordinary skill in the art would understand how to use the claimed invention, it is 
sufficiently specific. See Standard Oil Co, v. Montedison, S.p.a., 212 U.S.P.Q. 327, 343 (3d Cir. 
1981). The specificity requirement is met unless the asserted utility amounts to a "nebulous expression" 
such as "biological activity" or "biological properties" that does not convey meaningful information 
about the utility of what is being claimed. Cross v. lizuka, 753 F.2d 1040, 1048 (Fed. Cir. 1985). 

In addition to conferring a specific benefit on the public, the benefit must also be "substantial." 
Brenner, 383 U.S. at 534. A "substantial" utility is a practical, "real-world" utility. Nelson v. Bowler, 
626 F.2d 853, 856, 206 USPQ 881 (CCPA 1980). 

If persons of ordinary skill in the art would understand that there is a "well-established" utility 
for the claimed invention, the threshold is met automatically and the applicant need not make any 
showing to demonstrate utility. Manual of Patent Examination Procedure at § 706.03(a). Only if there 
is no "well-established" utility for the claimed invention must the applicant demonstrate the practical 
benefits of the invention. Id, 

Once the patent applicant identifies a specific utility, the claimed invention is presumed to 
possess it. In re Cortright, 165 F.3d 1353, 1357, 49 USPQ2d 1464 (Fed. Cir. 1999); In re Brana, 
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51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995). In that case, the Patent Office bears the 
burden of demonstrating that a person of ordinary skill in the art would reasonably doubt that the 
asserted utility could be achieved by the claimed invention. Id. To do so, the Patent Office must 
provide evidence or sound scientific reasoning. See In re Langer, 503 F.2d 1380, 1391-92, 183 
USPQ 288 (CCPA 1974). If and only if the Patent Office makes such a showing, the burden shifts to 
the applicant to provide rebuttal evidence that would convince the person of ordinary skill that there is 
sufficient proof of utility. Brana, 51 F.3d at 1566. The applicant need only prove a "substantial 
likelihood" of utility; certainty is not required. Brenner, 383 U.S. at 532. 

XL Use of the claimed polynucleotide for diagnosis of conditions or diseases characterized 
by expression of MHCH, for toxicology testing, and for drug discovery are sufficient 
utilities under 35 U.S.C. §§ 101 and 112, first paragraph 

The claimed invention meets all of the necessary requirements for establishing a credible utility 
under the Patent Law; There are "well-established" uses for the claimed invention known to persons of 
ordinary skill in the art, and there are specific practical and beneficial uses for the invention disclosed in 
the patent application's specification. These uses are explained, in detail, in the Bedilion Declaration 
accompanying this response. Objective evidence, not considered by the Patent Office, further 
corroborates the credibility of the asserted utilities. 

A. The use of MHCH for toxicology testing, drug discovery, and disease diagnosis 
are practical uses that confer "specific benefits" to the public 

The claimed invention has specific, substantial, real-world utility by virtue of its use in toxicology 
testing, drug development and disease diagnosis through gene expression profiling. These uses are 
explained in detail in the accompanying Bedilion Declaration, the substance of which is not rebutted by 
the Patent Examiner. There is no dispute that the claimed invention is in fact a useful tool in cDNA 
microarrays used to perform gene expression analysis. That is sufficient to establish utility for the 
claimed polynucleotide. 

In his Declaration, Dr. Bedilion explains the many reasons why a person skilled in the art 
reading the Tang '248 application on November 5, 1998 would have understood that application to 
disclose the claimed polynucleotide to be useful for a number of gene expression monitoring 
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applications, e.g.^ as a highly specific probe for the expression of that specific polynucleotide in 
connection with the development of drugs and the monitoring of the activity of such drugs. (Bedilion 
Declaration at, e.g., ff 10-15). Much, but not all, of Dr. Bedilion's explanation concerns the use of the 
claimed polynucleotide in cDNA microarrays of the type first developed at Stanford University for 
evaluating the efficacy and toxicity of drugs, as v^^ell as for other applications. (Bedilion Declaration, ff 
12 and 15).- 

In connection with his explanations, Dr. Bedilion states that the "Tang *248 specification would 
have led a person skilled in the art on November 5, 1998 who was using gene expression monitoring in 
connection with working on developing new drugs for the treatment of heart and skeletal muscle 
disorders, developmental disorders, and cell proliferative disorders, including cancer [a] to conclude 
that a cDNA microarray that contained the SEQ ID NO :1 -encoding polynucleotides would be a highly 
useful tool, and [b] to request specifically that any cDNA microarray that was being used for such 
purposes contain the SEQ ID NO:l-encoding polynucleotides" (Bedilion Declaration, f 15 ). For 
example, as explained by Dr. Bedilion, "[p]ersons skilled in the art would [have appreciated on 
November 5, 1998] that a cDNA microarray that contained the SEQ ID NO: 1-encoding 
polynucleotides would be a more useful tool than a cDNA microarray that did not contain the 
polynucleotides in connection with conducting gene expression monitoring studies on proposed (or 
actual) drugs for treating heart and skeletal muscle disorders, developmental disorders, and cell 
proliferative disorders, including cancer for such purposes as evaluating their efficacy and toxicity." Id, 

In support of those statements. Dr. Bedilion provided detailed explanations of how cDNA 
technology can be used to conduct gene expression monitoring evaluations, with extensive citations to 
pre-November 5, 1998 publications showing the state of the art on November 5, 1998. (Bedilion 
Declaration, f f 10-14). While Dr. Bedilion's explanations in paragraph 15 of his Declaration include 
almost three pages of text and six subparts (a)-(f), he specifically states that his explanations are not 
"all-inclusive." Id For example, with respect to toxicity evaluations. Dr. Bedilion had earlier explained 



-Dr. Bedilion also explained, for example, why persons skilled in the art would also appreciate, 
based on the Tang '248 specification, that the claimed polynucleotide would be useful in connection 
with developing new drugs using technology, such as Northern analysis, that predated by many years 
the development of the cDNA technology (Bedilion Declaration, f 16). 
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how persons skilled in the art who were working on drug development on November 5, 1998 (and for 
several years prior to November 5, 1998) "without any doubt" appreciated that the toxicity (or lack of 
toxicity) of any proposed drug was "one of the most important criteria to be evaluated in connection 
with the development of the drug" arid how the teachings of the Tang '248 application clearly include 
using differential gene expression analyses in toxicity studies (Bedilion Declaration, % 10). 

Thus, the Bedilion Declaration establishes that persons skilled in the art reading the Tang '248 
application at the time it was filed "would have wanted their cDNA microarray to have a [SEQ ID 
NO: 1 -encoding polynucleotide probe] because a microarray that contained such a probe (as compared 
to one that did not) would provide more useful results in the kind of gene expression monitoring studies 
using cDNA microarrays that persons skilled in the art have been doing since well prior to November 
5, 1998" (Bedilion Declaration, f 15, item (f)). This, by itself, provides more than sufficient reason to 
compel the conclusion that the Tang '248 application disclosed to persons skilled in the art at the time 
of its filing substantial, specific and credible real-world utilities for the claimed polynucleotide. 

Nowhere does the Patent Examiner address the fact that, as described on pp. 31-32 of the 
Tang '248 application, the claimed polynucleotides can be used as highly specific probes in, for 
example, cDNA microarrays - probes that without question can be used to measure both the existence 
and amount of complementary RNA sequences known to be the expression products of the claimed 
polynucleotides. The claimed invention is not, in that regard, some random sequence whose value as a 
probe is speculative or would require further research to determine. 

Given the fact that the claimed polynucleotide is known to be expressed, its utility as a 
measuring and analyzing instrument for expression levels is as indisputable as a scale's utility for 
measuring weight. This use as a measuring tool, regardless of how the expression level data ultimately 
would be used by a person of ordinary skill in the art, by itself demonstrates that the claimed invention 
provides an identifiable, real-world benefit that meets the utility requirement. Raytheon v. Roper, 724 
F.2d 951, (Fed. Cir. 1983) (claimed invention need only meet one of its stated objectives to be useful); 
In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999) (how the invention works is irrelevant to 
utility); MPEP § 2107 ("Many research tools such as gas chromatographs, screening assays, and 
nucleotide sequencing techniques have a clear, specific, and unquestionable utility (e.g., thev are useful 
in analyzing compounds )" (emphasis added)). 
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Though AppUcants need not so prove to demonstrate utihty, there can be no reasonable dispute 
that persons of ordinary skill in the art have numerous uses for information about relative gene 
expression including, for example, understanding the effects of a potential drug for treating heart and 
skeletal muscle disorders, developmental disorders, and cell proliferative disorders, including cancer. 
Because the patent application states explicitly that the claimed polynucleotide is known to be 
expressed in hematopoietic/inmiune system, gastrointestinal, musculoskeletal, and reproductive tissues, 
and in tissues associated with cancer (Specification at page 18, lines 12-17), and expresses a protein 
that is a member of the myosin family known to be associated with diseases such as heart and skeletal 
muscle disorders, developmental disorders, and cell proliferative disorders, including cancer, there can 
be no reasonable dispute that a person of ordinary skill in the art could put the claimed invention to such 
use. In other words, the person of ordinary skill in the art can derive more information about a potential 
heart and skeletal muscle disorders, developmental disorders, and cell proliferative disorders, including 
cancer drug candidate or potential toxin with the claimed invention than without it (see Bedilion 
Declaration at, e.g., % 15, subparts (e)-(f)). 

The Bedilion Declaration shows that a number of pre-November 5, 1998 publications confirm 
and further establish the utility of cDNA microarrays in a wide range of drug development gene 
expression monitoring applications at the time the Tang '248 application was filed (Bedilion Declaration 

10-14; Bedilion Exhibits A-G). Indeed, Brown and Shalon U.S. Patent No. 5,807,522 (the Brown 
'522 patent, Bedilion Exhibit D), which issued from a patent apphcation filed in June 1995 and was 
effectively published on December 29, 1995 as a result of the publication of a PCT counterpart 
application, shows that the Patent Office recognizes the patentable utility of the cDNA technology 
developed in the early to mid-1990s. As explained by Dr. Bedilion, among other things (Bedilion 
Declaration,^ 12): 

The Brown '522 patent further teaches that the "[m]icroarrays of immobilized nucleic 
acid sequences prepared in accordance with the invention" can be used in "numerous" 
genetic applications, including "monitoring of gene expression" applications (see 
Bedilion Tab D at col. 14, lines 36-42). The Brown '522 patent teaches (a) monitoring 
gene expression (i) in different tissue types, (ii) in different disease states, and (iii) in 
response to different drugs, and (b) that arrays disclosed therein may be used in 
toxicology studies (see Bedilion Tab D at col. 15, lines 13-18 and 52-58 and col. 18, 
lines 25-30). 
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Literature reviews published shortly after the filing of the Tang '248 application describing the 
state of the art further confirm the claimed invention's utility. Rockett et al. confirm, for example, that 
the claimed invention is useful for differential expression analysis regardless of how expression is 
regulated: 

Despite the development of multiple technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular analysis, 
recognition of the importance of differential gene expression and characterization of ' 
differentially expressed genes has existed for many years. 

* * * 

Although differential expression technologies are applicable to a broad range of models, 
perhaps their most important advantage is that, in most cases, absolutely no prior 
knowledge of the specific genes which are up- or down-regulated is required. 

* * * 

Whereas it would be informative to know the identity and functionality of all genes 
up/down regulated by . . . toxicants, this would appear a longer term goal .... 
However, the current use of gene profiling yields a pattern of gene changes for a 
xenobiotic of unknown toxicity which may be matched to that of well characterized 
toxins, thus alerting the toxicologist to possible in vivo similarities between the unknown 
and the standard, thereby providing a platform for more extensive toxicological 
examination, (emphasis added) 

Rockett et al., Differential gene expression in drug metabolism and toxicologv: practicalities, problems 

and potential , 29 Xenobiotica No. 7, 655 (1999). 

In another pre-November 5, 1998 article, Lashkari et al. state explicitly that sequences that are 

merely "predicted" to be expressed (predicted Open Reading Frames, or ORFs) - the claimed 

invention in fact is known to be expressed - have numerous uses: 

Efforts have been directed toward the amplification of each predicted ORF or any 
other region of the genome ranging from a few base pairs to several kilobase pairs. 
There are many uses for these amplicons- they can be cloned into standard vectors or 
specialized expression vectors, or can be cloned into other specialized vectors such as 
those used for two-hybrid analysis. The amplicons can also be used directly bv, for 
example, arraving onto glass for expression analysis , for DNA binding assays, or for 
any direct DNA assay. 



109025 



15 



09/830,914 



DflPket No.: PF-0621 USN 



Lashkari et al., Whole genome analysis: Experimental access to all genome sequenced segments 
through larger-scale efficient oligonucleotide synthesis and PCR , 94 Proc. Nat. Acad. Sci. 8945 (Aug. 
1997) (emphasis added). 

B. The use of nucleic acids coding for proteins expressed by humans as tools for 
toxicology testing, drug discovery, and the diagnosis of disease is now *Svell- 
established" 

The technologies made possible by expression profiling and the DNA tools upon which they 
rely are now well-established. The technical literature recognizes not only the prevalence of these 
technologies, but also their unprecedented advantages in drug development, testing and safety 
assessment. These technologies include toxicology testing, as described by Bedilion in his Declaration. 

Toxicology testing is now standard practice in the pharmaceutical industry. See, e.g., John C. 
Rockett et al., supra: 

Knowledge of toxin-dependent regulation in target tissues is not solely an academic pursuit as 
much interest has been generated in the pharmaceutical industry to harness this technology in 
the early identification of toxic drug candidates, thereby shortening the developmental process 
and contributing substantially to the safety assessment of new drugs. 

To the same effect are several other scientific pubHcations, including Emile F. Nuwaysir et al., 

Microarrays and Toxicologv: The Advent of Toxicogenomics , 24 Molecular Carcinogenesis 153 

(1999); Sandra Steiner and N. Leigh Anderson, Expression profiling in toxicologv — potentials and 

limitations , 112-13 Toxicology Letters 467 (2000). 

Nucleic acids useful for measuring the expression of whole classes of genes are routinely 

incorporated for use in toxicology testing. Nuwaysir et al. describes, for example, a Human ToxChip 

comprising 2089 human clones, which were selected 

for their well-documented involvement in basic cellular processes as well as their responses to 
different types of toxic insult. Included on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and dioxin-like compounds, peroxisome 
proliferators, estrogenic compounds, and oxidant stress. Some of the other categories of genes 
include transcription factors, oncogenes, tumor suppressor genes, cyclins, kinases, 
phosphatases, cell adhesion and motility genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridization intensity is averaged and used for signal 
normalization of the other genes on the chip. 
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See also Table 1 of Nuwaysir et al. (listing additional classes of genes deemed to be of special interest 
in making a human toxicology microarray). 

The more genes that are available for use in toxicology testing, the more powerful the technique. 
"Arrays are at their most powerful when they contain the entire genome of the species they are being 
used to study." John C. Rockett and David J. Dix, Application of DNA Arrays to Toxicology , 107 
Environ. Health Perspec.681, No. 8 (1999). Control genes are carefully selected for their stability 
across a large set of array experiments in order to best study the effect of toxicological compounds. 
See attached email from the primary investigator on the Nuwaysir paper. Dr. Cynthia Afshari, to an 
Incyte employee, dated July 3, 2000, as well as the original message to which she was responding, 
indicating that even the expression of carefully selected control genes can be altered. Thus, there is no 
expressed gene which is irrelevant to screening for toxicological effects, and all expressed genes have a 
utility for toxicological screening. 

In fact, the potential benefit to the public, in terms of lives saved and reduced health care costs, 
are enormous. Recent developments provide evidence that the benefits of this information are already 
beginning to manifest themselves. Examples include the following: 

• In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 
expression technology, information about the structure of a known transporter gene, 
and chromosomal mapping location, to identify the key gene associated with Tangiers 
disease. This discovery took place over a matter of only a few weeks, due to the 
power of these new genomics technologies. The discovery received an award from the 
American Heart Association as one of the top 10 discoveries associated with heart 
disease research in 1999. 

• In an April 9, 2000, article published by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte' s genomic information 
database. Other Incyte customers have privately reported similar experiences. The 
implications of this significant saving of time and expense for the number of drugs that 
may be developed and their cost are obvious. 

• In a February 10, 2000, article in the Wall Street Journal, one Incyte customer stated 
that over 50 percent of the drug targets in its current pipeline were derived from the 
Incyte database. Other Incyte customers have privately reported similar experiences. 
By doubling the number of targets available to pharmaceutical researchers, Incyte 
genomic information has demonstrably accelerated the development of new drugs. 
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Because the Patent Examiner failed to address or consider the "well-established" utilities for the 
claimed invention in toxicology testing, drug development, and the diagnosis of disease, the Examiner's 
rejections should be overturned regardless of their merit. 

The Uncontested Fact That the Claimed Polynucleotide Encodes for a Protein 
in the Myosin Family Also Demonstrates Utility 

In addition to having substantial, specific and credible utilities in numerous gene expression 
monitoring applications, it is undisputed that the claimed polynucleotide encodes for a protein having the 
sequence shown as SEQ ID NO: 1 in the patent application and referred to as MHCH in that 
application. Appellants have demonstrated that MHCH is a member of the myosin family, and that 
the myosin family of proteins includes motor proteins that are involved in muscle contraction, 
intracellular movement, phagocytosis, and cytokinesis. 

The Patent Examiner does not dispute any of the facts set forth in the previous paragraph. 
Neither does the Patent Examiner dispute that, if a polynucleotide encodes for a protein that has a 
substantial, specific and credible utility, then it follows that the polynucleotide also has a substantial, 
specific and credible utility. 

The Examiner must accept the applicant's demonstration that the polypeptide encoded by the 
claimed invention is a member of the myosin family and that utility is proven by a reasonable probability 
unless the Examiner can demonstrate through evidence or sound scientific reasoning that a person of 
ordinary skill in the art would doubt utility. See In re Langer, 503 R2d 1380, 1391-92, 183 USPQ 
288 (CCPA 1974). The Examiner has not provided sufficient evidence or sound scientific reasoning to 
the contrary. 

Nor has the Examiner provided any evidence that any member of the myosin family, let alone a 
substantial number of those members, is not useful. In such circumstances, the only reasonable 
inference is that the polypeptide encoded by the claimed invention must be useful, like the other 
members of the myosin family. 
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D. Objective evidence corroborates the utilities of the claimed invention 

There is, in fact, no restriction on the kinds of evidence a Patent Examiner may consider in 
determining whether a "real-world" utility exists. Indeed, "real-world" evidence, such as evidence 
showing actual use or commercial success of the invention, can demonstrate conclusive proof of utility. 
Raytheon v. Roper, 220 USPQ2d 592 (Fed. Cir. 1983); Nestle v. Eugene, 55 F.2d 854, 856, 12 
USPQ 335 (6th Cir. 1932). Indeed, proof that the invention is made, used or sold by any person or 
entity other than the patentee is conclusive proof of utility. United States Steel Corp, v. Phillips 
Petroleum Co,, 865 F.2d 1247, 1252, 9 USPQ2d 1461 (Fed. Cir. 1989). 

Over the past several years, a vibrant market has developed for databases containing all 
expressed genes (along with the polypeptide translations of those genes), in particular genes having 
medical and pharmaceutical significance such as the instant sequence. (Note that the value in these 
databases is enhanced by their completeness, but each sequence in them is independently valuable.) 
The databases sold by Applicants' assignee, Incyte, include exactly the kinds of information made 
possible by the claimed invention, such as tissue and disease associations. Incyte sells its database 
containing the claimed sequence and millions of other sequences throughout the scientific community, 
including to pharmaceutical companies who use the information to develop new pharmaceuticals. 

Both Incyte' s customers and the scientific community have acknowledged that Incyte' s 
databases have proven to be valuable in, for example, the identification and development of drug 
candidates. As Incyte adds information to its databases, including the information that can be generated 
only as a result of Incyte's discovery of the claimed polynucleotide and its use of that polynucleotide on 
cDNA microarrays, the databases become even more powerful tools. Thus the claimed invention adds 
more than incremental benefit to the drug discovery and development process. 

III. The Patent Examiner's Rejections Are Without Merit 

Rather than responding to the evidence demonstrating utility, the Examiner attempts to dismiss it 
altogether by arguing that the disclosed and well-established utilities for the claimed polynucleotide are 
not "specific or substantial" utilities. (Office Action at p. 4). The Examiner is incorrect both as a matter 
of law and as a matter of fact. 
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A. The Precise Biological Role Or Function Of An Expressed Polynucleotide Is 
Not Required To Demonstrate Utility 

The Patent Examiner's primary rejection of the claimed invention is based on the ground that, 
without infomiation as to the precise "biological role" of the claimed invention, the claimed invention's 
utility is not sufficiently specific. According to the Examiner, it is not enough that a person of ordinary 
skill in the art could use and, in fact, would want to use the claimed invention either by itself or in a 
cDNA microarray to monitor the expression of genes for such applications as the evaluation of a drug's 
efficacy and toxicity. The Examiner would require, in addition, that the applicant provide a specific and 
substantial interpretation of the results generated in any given expression analysis. 

It may be that specific and substantial interpretations and detailed information on biological 
function are necessary to satisfy the requirements for publication in some technical journals, but they are 
not necessary to satisfy the requirements for obtaining a United States patent. The relevant question is 
not, as the Examiner would have it, whether it is known how or why the invention works, In re 
Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999), but rather whether the invention provides an 
"identifiable benefit" in presently available form. Juicy Whip Inc. v. Orange Bang Inc., 185 F.3d 
1364, 1366 (Fed. Cir. 1999). If the benefit exists, and there is a substantial likelihood the invention 
provides the benefit, it is useful. There can be no doubt, particularly in view of the Bedilion Declaration 
(at, e,g.,%f 10 and 15, Bedilion), that the present invention meets this test. 

The threshold for determining whether an invention produces an identifiable benefit is low. 
Juicy Whip, 185 F.3d at 1366. Only those utilities that are so nebulous that a person of ordinary skill 
in the art would not know how to achieve an identifiable benefit and, at least according to the PTO 
guideHnes, so-calied "throwaway" utilities that are not directed to a person of ordinary skill in the art at 
all, do not meet the statutory requirement of utiHty. Utility Examination Guidelines, 66 Fed. Reg. 1092 
(Jan. 5, 2001). 

Knowledge of the biological function or role of a biological molecule has never been required to 
show real-world benefit. In its most recent explanation of its own utility guidelines, the PTO 
acknowledged so much (66 F.R. at 1095): 
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[T]he utility of a claimed DNA does not necessarily depend on the function of the 
encoded gene product. A claimed DNA may have specific and substantial utility 
because, e.g., it hybridizes near a disease-associated gene or it has gene-regulating 
activity. 

By implicitly requiring knowledge of biological function for any claimed nucleic acid, the 
Examiner has, contrary to law, elevated what is at most an evidentiary factor into an absolute 
requirement of utility. Rather than looking to the biological role or function of the claimed invention, the 
Examiner should have looked first to the benefits it is alleged to provide. 

B. Membership in a Class of Useful Products Can Be Proof of Utility 

Despite the uncontradicted evidence that the claimed polynucleotide encodes a polypeptide in 
the myosin family, the Examiner refused to impute the utility of the members of the myosin family to 
MHCH. In the Office Action, the Patent Examiner takes the position that, unless Applicants can 
identify which particular biological function within the class of myosins is possessed by MHCH, utility 
cannot be imputed. To demonstrate utility by membership in the class of myosins, the Examiner would 
require that all myosins possess a "common" utility. 

There is no such requirement in the law. In order to demonstrate utility by membership in a 
class, the law requires only that the class not contain a substantial number of useless members. So long 
as the class does not contain a substantial number of useless members, there is sufficient likelihood that 
the claimed invention will have utility, and a rejection under 35 U.S.C. § 101 is improper. That is true 
regardless of how the claimed invention ultimately is used and whether or not the members of the class 
possess one utility or many. See Brenner v. Manson, 383 U.S. 519, 532 (1966); Application of 
Kirk, 376 F.2d 936, 943 (CCPA 1967). 

Membership in a "general" class is insufficient to demonstrate utility only if the class contains a 
sufficient number of useless members such that a person of ordinary skill in the art could not impute 
utility by a substantial likelihood. There would be, in that case, a substantial likelihood that the claimed 
invention is one of the useless members of the class. In the few cases in which class membership did 
not prove utility by substantial likelihood, the classes did in fact include predominately useless members. 
E.g., Brenner (man-made steroids); Kirk (same); Natta (man-made polyethylene polymers). 
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The Examiner addresses MHCH as if the general class in which it is included is not the myosin 
family, but rather all polynucleotides or all polypeptides, including the vast majority of useless theoretical 
molecules not occurring in nature, and thus not pre-selected by nature to be useful. While these 
"general classes" may contain a substantial number of useless members, the myosin family does not. 
The myosin family is sufficiently specific to rule out any reasonable possibility that MHCH would not 
also be useful like the other members of the family. 

Because the Examiner has not presented any evidence that the myosin class of proteins has any, 
let alone a substantial number, of useless members, the Examiner must conclude that there is a 
"substantial likelihood" that the N4HCH encoded by the claimed polynucleotide is useful. It follows that 
the claimed polynucleotide also is useful. 

It is undisputed that known members of the myosin family are motor proteins involved in muscle 
contraction, intracellular movement, phagocytosis, and cytokinesis. A person of ordinary skill in the art 
need not know any more about how the claimed invention functions to use it, and the Examiner presents 
no evidence to the contrary. The Examiner then goes on to assume that the only use for MHCH absent 
knowledge as to how the myosin actually works is further study of MHCH itself. 

Not so. As demonstrated by Applicants, knowledge that MHCH is a myosin is more than 
sufficient to make it useful for the diagnosis and treatment of heart and skeletal muscle disorders, 
developmental disorders, and cell proliferative disorders, including cancer. Indeed, MHCH has been 
shown to be expressed in hematopoietic/immune system, gastrointestinal, musculoskeletal, and 
reproductive tissues, and in tissues associated with cancer (Specification at page 18, lines 12-17). The 
Examiner must accept these facts to be true unless the Examiner can provide evidence or sound 
scientific reasoning to the contrary. But the Examiner has not done so. 

C. Because the uses of polynucleotides encoding MHCH in toxicology testing, 

drug discovery, and disease diagnosis are practical uses beyond mere study of 
the invention itself, the claimed invention has substantial utility* 

The PTO rejected the claims at issue on the ground that the use of an invention as a tool for 

research is not a "substantial" use. Because the PTO's rejection assumes a substantial overstatement of 

the law, and is incorrect in fact, it must be overturned. 
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There is no authority for the proposition that use as a tool for research is not a substantial utility. 

Indeed, the Patent Office has recognized that just because an invention is used in a research setting 

does not mean that it lacks utility (MPEP § 2107): 

Many research tools such as gas chromatographs, screening assays, and nucleotide sequencing 
techniques have a clear, specific and unquestionable utility (e.g., they are useful in analyzing 
compounds). An assessment that focuses on whether an invention is useful only in a research 
setting thus does not address whether the specific invention is in fact "useful" in a patent sense. 
Instead, Office personnel must distinguish between inventions that have a specifically identified 
utility and inventions whose specific utility requires further research to identify or reasonably 
confirm. 

The Patent Office's actual practice has been, at least until the present, consistent with that 
approach. It has routinely issued patents for inventions whose only use is to faciUtate research, such as 
DNA ligases. These are acknowledged by the PTO's Training Materials themselves to be useful, as 
well as DNA sequences used, for example, as markers. 

Only a limited subset of research uses are not "substantial" utilities: those in which the only 
known use for the claimed invention is to be an object of further study, thus merely inviting further 
research. This follows from Brenner, in which the U.S. Supreme Court held that a process for making 
a compound does not confer a substantial benefit where the only known use of the compound was to 
be the object of further research to determine its use. Id at 535. Similarly, in Kirk, the Court held that 
a compound would not confer substantial benefit on the public merely because it might be used to 
synthesize some other, unknown compound that would confer substantial benefit. Kirk, 376 F.2d at 
940, 945 ("What Applicants are really saying to those in the art is take these steroids, experiment, and 
find what use they do have as medicines."). Nowhere do those cases state or imply, however, that a 
material cannot be patentable if it has some other beneficial use in research. 

As used in toxicology testing, drug discovery, and disease diagnosis, the claimed invention has a 
beneficial use in research other than studying the claimed invention or its protein products. It is a tool, 
rather than an object, of research. The data generated in gene expression monitoring using the claimed 
invention as a tool is not used merely to study the claimed polynucleotide itself, but rather to study 
properties of tissues, cells, and potential drug candidates and toxins. Without the claimed invention, the 
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information regarding the properties of tissues, cells, drug candidates and toxins is less complete. 
(Bedilion Declaration at f 15.) 

The claimed invention has numerous additional uses as a research tool, each of which alone is a 
"substantial utility." These include uses such as diagnostic assays (e.g., pages 36-39), chromosomal 
markers (e.g., pages 39-40), and ligand screening assays (e.g., page 40). 

IV. By Requiring the Patent Applicant to Assert a Particular or Unique Utility, the Patent 
Examination Utility Guidelines and Training Materials Applied by the Patent 
Examiner Misstate the Law 

There is an additional, independent reason to overtum the rejections: to the extent the rejections 
are based on Revised Interim Utility Examination Guidelines (64 FR 71427, December 21, 1999), the 
final Utility Examination Guidelines (66 FR 1092, January 5, 2001) and/or the Revised Interim Utility 
GuideUnes Training Materials (USPTO Website www.uspto.gov, March 1, 2000), the Guidelines and 
Training Materials are themselves inconsistent with the law. 

The Training Materials, which direct the Examiners regarding how to apply the Utility 

Guidelines, address the issue of specificity with reference to two kinds of asserted utilities: "specific" 

utilities which meet the statutory requirements, and "general" utilities which do not. The Training 

Materials define a "specific utility" as follows: 

A [specific utility] is specific to the subject matter claimed. This contrasts to general utility that 
would be applicable to the broad class of invention. For example, a claim to a polynucleotide 
whose use is disclosed simply as "gene probe" or "chromosome marker" would not be 
considered to be specific in the absence of a disclosure of a specific DNA target. Similarly, a 
general statement of diagnostic utility, such as diagnosing an unspecified disease, would 
ordinarily be insufficient absent a disclosure of what condition can be diagnosed. 

The Training Materials distinguish between "specific" and "general" utilities by assessing 

whether the asserted utility is sufficiently "particular," i,e,, unique (Training Materials at p.52) as 

compared to the "broad class of invention." (In this regard, the Training Materials appear to parallel 

the view set forth in Stephen G. Kunin, Written Description Guidelines and Utility Guidelines . 82 

J.P.T.O.S. 77, 97 (Feb. 2000) ("With regard to the issue of specific utility the question to ask is 

whether or not a utility set forth in the specification is particular to the claimed invention.")). 
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Such "unique" or "particular" utilities never have been required by the law. To meet the utility 
requirement, the invention need only be "practically useful," Natta, 480 F.2d 1 at 1397, and confer a 
"specific benefit" on the public. Brenner, 383 U.S. at 534. Thus, incredible "throwaway" utilities, such 
as trying to "patent a transgenic mouse by saying it makes great snake food," do not meet this standard. 
Karen Hall, Genomic Warfare , The American Lawyer 68 (June 2000) (quoting John Doll, Chief of the 
Biotech Section of USPTO). 

This does not preclude, however, a general utility, contrary to the statement in the Training 
Materials where "specific utility" is defined (page 5). Practical real-world uses are not limited to uses 
that are unique to an invention. The law requires that the practical utility be "definite," not particular. 
Montedison, 664 F.2d at 375. Applicant is not aware of any court that has rejected an assertion of 
utility on the grounds that it is not "particular" or "unique" to the specific invention. Where courts have 
found utility to be too "general," it has been in those cases in which the asserted utility in the patent 
disclosure was not a practical use that conferred a specific benefit. That is, a person of ordinary skill in 
the art would have been left to guess as to how to benefit at all from the invention. In Kirk, for 
example, the CCPA held the assertion that a man-made steroid had "useful biological activity" was 
insufficient where there was no information in the specification as to how that biological activity could be 
practically used. Kirk, 376 F.2d at 941. 

The fact that an invention can have a particular use does not provide a basis for requiring a 
particular use. See Brana, supra (disclosure describing a claimed antitumor compound as being 
homologous to an antitumor compound having activity against a "particular" type of cancer was 
determined to satisfy the specificity requirement). "Particularity" is not and never has been the sine qua 
nan of utility; it is, at most, one of many factors to be considered. 

As described supra, broad classes of inventions can satisfy the utility requirement so long as a 
person of ordinary skill in the art would understand how to achieve a practical benefit from knowledge 
of the class. Only classes that encompass a significant portion of nonuseful members would fail to meet 
the utility requirement. Supra § n.B.2 {Montedison, 664 F.2d at 374-75). 

The Training Materials fail to distinguish between broad classes that convey information of 
practical utility and those that do not, lumping all of them into the latter, unpatentable category of 
"general" utilities. As a result, the Training Materials paint with too broad a brush. Rigorously applied, 
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they would render unpatentable whole categories of inventions that heretofore have been considered to 
be patentable and that have indisputably benefitted the public, including the claimed invention. See 
supra § n.B. Thus the Training Materials cannot be applied consistently with the law. 

V. To the Extent the Rejection of the Patented Invention under 35 U.S-C. § 112, First 
Paragraph, Is Based on the Improper Rejection for Lack of Utility under 35 U.S.C. 
§ 101, it Must Be Reversed, 

The rejection set forth in the Office Action is based on the assertions discussed above, i.e., that 
the claimed invention lacks patentable utility. To the extent that the rejection under § 112, first 
paragraph, is based on the improper allegation of lack of patentable utility under § 101, it fails for the 
same reasons. 

Enablement rejections under 35 U.S.C, § 112, first paragraph 

Original claims 3, 4, 9, and 10, now replaced by new claims 23-25, 30, and 31, are rejected 
for allegedly failing to meet the requirements of 35 U.S.C. § 112, first paragraph, on the grounds that 
the Specification does not provide an enabling disclosure commensurate in scope with the claims 
(Office Action pages 4-5). In particular, the Examiner asserts that "searching for the specific 
nucleotides to change (deletion, insertion, substitution, or combinations thereof) in a polynucleotide to 
make any polynucleotide of any nucleotide sequence having 70% identity to any polynucleotide 
encoding SEQ ID NO:l or any fragment thereof or any polynucleotide having 70% identity to SEQ ID 
N0:2 or any fragment thereof is well outside the realm of routine experimentation and predictability in 
the art..." (Office Action, page 5). The Applicants traverse the rejection for at least the following 
reasons. 

As set forth in In re Marzocchi, 169 USPQ 367, 369 (CCPA 1971): 

The first paragraph of § 1 12 requires nothing more than objective enablement. How 
such a teaching is set forth, either by the use of illustrative examples or by broad 
terminology, is of no importance. 

As a matter of Patent Office practice, then, a specification disclosure which contains a 
teaching of the manner and process of making and using the invention in terms which 
correspond in scope to those used in describing and defining the subject matter sought 
to be patented must be take as in compliance with the enabHng requirement of the first 
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paragraph of § 112 unless there is reason to doubt the objective truth of the statements 
contained therein which must be relied on for enabling support. 

Applicants submit that the disclosure amply enables the claimed invention. First, Applicants 
respectfully point out that the claims of the instant application are drawn to naturally-occurring 
variants. Thus it is not necessary to screen every conceivable variant which might be made using 
recombinant methods, as all that is claimed are those variant sequences which are found in nature. 
Given the sequences of SEQ ID NO:l and SEQ ID N0:2, one of ordinary skill in the art could readily 
identify a polynucleotide encoding a polypeptide comprising a naturally occurring amino acid sequence 
at least 90% identical to an amino acid sequence of SEQ ID NO:l or a polynucleotide comprising a 
naturally occurring polynucleotide sequence at least 70% identical to a polynucleotide sequence of SEQ 
ID NO:2, using well known methods of sequence analysis without any undue experimentation. For 
example, the identification of relevant polynucleotides could be performed by hybridization and/or PGR 
techniques that were well-known to those skilled in the art at the time the subject application was filed 
and/or described throughout the Specification of the instant application. See, e.g., page 12, line 13 
through page 13, line 9; page 25, lines 2-6 and 18-28; and Example VI at pages 45-46. Thus, one 
skilled in the art need not make and test vast numbers of polynucleotides. Instead, one skilled in the art 
need only screen a cDNA library or use appropriate PGR conditions to identify relevant 
polynucleotides that already exist in nature. The skilled artisan would also know how to use the 
claimed polynucleotides, for example in expression profiling, disease diagnosis, or detection of related 
sequences as discussed above. 

The specification also describes the expression vectors into which the claimed fragments could 
be inserted, and the construction of fusion proteins (pages 22-24 and page 47, line 8 through page 48, 
line 3). The specification describes, for example, specific assays for myosin activity on page 48; 
binding assays to detect molecular interactions of "MHGH or biologically active fragments thereof on 
page 50, lines 4-19; and immunological methods for detecting and measuring MHGH on page 25, lines 
7-16. These methods could be used to detect and characterize peptide variants and fragments of SEQ 
ID N0:1. Given this guidance, one of ordinary skill in the art would readily understand how to select 
and screen polynucleotides encoding fragments of SEQ ID NO.T with ATPase activity or immunogenic 
activity without any undue experimentation. 
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Furthermore, the claims are directed to polynucleotides , not polypeptides, and it is the 
functionality of the claimed polynucleotides, not the polypeptides encoded by them, that is relevant. 
Members of the claimed genus of variants may include, for example, mutant alleles associated with 
diseases, or single nucleotide polymorphisms (SNPs). Members of the claimed genus of variants may 
be useful even if they encode defective MHCH polypeptides. For example, the variant polynucleotides 
could be used for the detection of sequences related to MHCH (see the specification at page 25, lines 
17-28, and page 36, Hnes 24-30) including MHCH variants that may be associated with disease states, 
such as the diseases listed on page 27, line 16 through page 28, line 3, of the specification. See the 
specification at, for example, pages 36-40 for disclosure of how to use the claimed sequences in 
diagnostic assays. 

The Examiner has cited Attwood et al., identifying some of the difficulties that may be involved 
in predicting protein function; however, this reference does not suggest that functional homology cannot 
be inferred by a reasonable probability in this case. At most, this article suggests that it is difficult to 
make predictions about function with certainty. The standard applicable in this case is not proof to 
certainty, but rather, proof to a reasonable probability. In fact, Attwood et al. point out the value of 
sequence analysis, in particular with regard to the identification of conserved motifs in proteins. 
"Because motifs usually reflect some vital structural or functional role (Fig. 2), they effectively provide 
diagnostic family signatures" (Emphasis added; Attwood et al. p. 332, col. 2). 

An analysis of the sequence of SEQ ID N0:1 shows that it contains conserved residues and 
structural motifs characteristic of members of the myosin family. For example, the specification shows 
alignments of SEQ ID N0:1 with C. elegans myosin I heavy chain and H, annuus unconventional 
myosin heavy chain, and points out regions of homology and conserved amino acid residues in the three 
proteins (Specification at Figure 2). The specification, on page 17, line 26 through page 18, line 9, 
identifies specific residues and myosin signatures within SEQ ID N0:1, including the myosin head 
domain, which is known to contain the ATPase activity and actin binding sites in myosin motor proteins. 
The Examiner's attention is directed to Exhibit A which shows the identification of the myosin motor 
head domain in SEQ ID NO: 1 by HMMER analysis of the PFAM database. At the time of filing of the 
instant application, the crystal lographic structure of a myosin motor head was available to assist one of 
skill in the art in the determination of "specific catalytic residues and structural motifs," particularly 
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those critical for ATPase activity and actin binding (See the enclosed reference of Rayment et al. 

(1993) Science 261:50-58). 

Further, the Examiner requires working examples (Office Action, page 4). There is no such 

requirement under the law to provide "working examples." As set forth in In re Borkowski, 164 

USPQ 642, 645 (CCPA 1970) (footnote omitted): 

However, as we have stated in a number of opinions, a specification need not contain a 
working example if the invention is otherwise disclosed in such a manner that one skilled in the 
art will be able to practice it without an undue amount of experimentation. 

See also M.P.E.P. 2164.02 as follows: 

Compliance with the enablement requirement of 35 U.S.C. 1 12, first paragraph, does not turn 
on whether an example is disclosed. An example may be "working" or "prophetic"... A 
prophetic example describes an embodiment of the invention based on predicted results rather 
than work actually conducted or results actually achieved. 

Thus, there is no requirement under the law to provide "working examples" of what is claimed. 
Rather, one looks to whether the specification provides a description of how to make what is claimed. 
The present specification provides the requisite description. 

Contrary to the standard set forth in Marzocchi and Borkowski, the Examiner has failed to 
provide any reasons why one would doubt that the guidance provided by the present specification 
would enable one to make and use the recited polynucleotides. Hence, a prima facie case for non- 
enablement has not been established. For at least the above reasons, withdrawal of the enablement 
rejections under 35 U.S.C. § 112, first paragraph, is respectfully requested. 

Written description re jections under 35 U.S.C. § 112, first paragraph 

Original claims 3-6 and 9-14, now replaced by new claims 23-31 have been rejected under the 
first paragraph of 35 U.S.C. 1 12 for alleged lack of an adequate written description. This rejection is 
respectfully traversed. 

The requirements necessary to fulfill the written description requirement of 35 U.S.C. 112, first 

paragraph, are well established by case law. 

... the applicant must also convey with reasonable clarity to those skilled in 
the art that, as of the filing date sought, he or she was in possession of the invention. 
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The invention is, for purposes of the "written description" inquiry, whatever is now 
claimed. Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1117 (Fed. Cir. 1991) 

Attention is also drawn to the Patent and Trademark Office's own "GuideUnes for Examination 
of Patent Applications Under the 35 U.S.C. Sec. 112, para. 1", published January 5, 2001, which 
provide that : 

An applicant may also show that an invention is complete by disclosure of sufficiently 
detailed, relevant identifying characteristics'^^ which provide evidence that applicant was 
in possession of the claimed invention,"^^ i.e., complete or partial structure, other physical 
and/or chemical properties, functional characteristics when coupled with a known or 
disclosed correlation between function and structure, or some combination of such 
characteristics."^ What is conventional or well known to one of ordinary skill in the art 
need not be disclosed in detail.^^ If a skilled artisan would have understood the inventor 
to be in possession of the claimed invention at the time of filing, even if every nuance of 
the claims is not explicitly described in the specification, then the adequate description 
requirement is met."^^ 

Thus, the written description standard is fulfilled by both what is specifically disclosed and what 
is conventional or well known to one skilled in the art. 

SEQ ID NO: 1 and SEQ ID N0:2 are specifically disclosed in the application (see, for 
example, page 17, lines 19-34). Variants of SEQ ID N0:1 and SEQ ID N0:2 are described, for 
example, at page 18, lines 18-33. Incyte clones in which the nucleic acids encoding the human myosin 
heavy chain homolog were first identified and libraries from which those clones were isolated are 
described, for example, at page 17, lines 19-25 of the Specification. Chemical and structural features 
of SEQ ID NO: 1 are described, for example, on page 17, line 26 through page 18, line 9. Given SEQ 
ID N0:1, one of ordinary skill in the art would recognize naturally-occurring variants of SEQ ID N0:1 
having 90% sequence identity to SEQ ID NO:l. Given SEQ ID NO:2, one of ordinary skill in the art 
would recognize naturally-occurring variants of SEQ ID N0:2 having 70% sequence identity to SEQ 
ID N0:2. Accordingly, the Specification provides an adequate written description of the recited 
polypeptide sequences. 
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A. The Specification provides an adequate written description of the claimed 
variants and fragments of SEQ ID NO:2. 

The Office Action has further asserted that the claims are not supported by an adequate written 
description because the specification "only provides the following representative species encompassed 
by these claims: a polynucleotide consisting of the nucleotide sequence of SEQ ED NO:2 and a 
polynucleotide encoding a polypeptide consisting of the amino acid sequence of SEQ ID N0:1.... 
Given this lack of additional representative species as encompassed by the claims. Applicants have 
failed to sufficiently describe the claimed invention"... 

Such a position is believed to present a misapplication of the law. 

1. The present claims specifically define the claimed genus through the recitation 
of chemical structure 

Court cases in which "DNA claims" have been at issue commonly emphasize that the recitation 
of structural features or chemical or physical properties are important factors to consider in a written 
description analysis of such claims. For example, in Fiers v. Revel, 25 USPQ2d 1601, 1606 (Fed. 
Cir. 1993), the court stated that: 

If a conception of a DNA requires a precise definition, such as by structure, formula, 

chemical name or physical properties, as we have held, then a description also requires 

that degree of specificity. 

In a number of instances in which claims to DNA have been found invalid, the courts have 

noted that the claims attempted to define the claimed DNA in terms of functional characteristics without 

any reference to structural features. As set forth by the court in University of California v. Eli Lilly 

and Co,, 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 

In claims to genetic material, however, a generic statement such as "vertebrate insulin 
cDNA" or "mammalian insulin cDNA," without more, is not an adequate written 
description of the genus because it does not distinguish the claimed genus from others, 
except by function. 

Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. For 
example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written description 
requirement the following claim of U.S. Patent No. 4,652,525: 
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1. A recombinant plasmid replicable in procaryotic host containing within its nucleotide 
sequence a subsequence having the stracture of the reverse transcript of an mRNA of a 
vertebrate, which mRNA encodes insulin. 

In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following count: 

A DNA which consists essentially of a DNA which codes for a human fibroblast 
interferon-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an adequate 
written description of the DNA of the count because that application mentioned a potential method for 
isolating the DNA. The Revel priority application, however, did not have a description of any particular 
DNA structure corresponding to the DNA of the count. The court therefore found that the Revel 
priority application lacked an adequate written description of the subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defined on the basis of functional characteristics 
and were found not to comply with the written description requirement of 35 U.S.C. §112; /.e., "an 
mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA which codes for a human 
fibroblast interferon-beta polypeptide" in Fiers, In contrast to the situation in Lilly and Fiers, the 
claims at issue in the present application define polynucleotides in terms of chemical structure, rather 
than on functional characteristics. For example, the "variant language" of independent claim 30 recites 
chemical structure to define the claimed genus: 

30. An isolated polynucleotide selected from the group consisting of:... 

b) a polynucleotide comprising a naturally occurring polynucleotide sequence at 
least 70% identical to a polynucleotide sequence of SEQ ID NO:2... 

From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers. The subject matter of the present 
claims is defined in terms of the chemical structure of SEQ ID NO:2. In the present case, there is no 
reliance merely on a description of functional characteristics of the polynucleotides recited by the 
claims. In fact, there is no recitation of functional characteristics. Moreover, if such functional 
recitations were included, it would add to the structural characterization of the recited polynucleotides. 
The polynucleotides defined in the claims of the present application recite structural features, and cases 
such as Lilly and Fiers stress that the recitation of structure is an important factor to consider in a 
written description analysis of claims of this type. By failing to base its written description inquiry "on 
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whatever is now claimed," the Office Action failed to provide an appropriate analysis of the present 
claims and how they differ from those found not to satisfy the written description requirement in Lilly 
and Fiers 

2. The present claims do not deflne a genus which is ^^highly variant" 

Furthermore, the claims at issue do not describe a genus which could be characterized as 
"highly variant." Available evidence illustrates that the claimed genus is of narrow scope. 

In support of this assertion, the Examiner's attention is directed to the enclosed reference by 
Brenner et al. ("Assessing sequence comparison methods with reliable structurally identified distant 
evolutionary relationships," Proc. Natl. Acad. Sci. USA (1998) 95:6073-6078). Through exhaustive 
analysis of a data set of proteins with known structural and functional relationships and with <90% 
overall sequence identity, Brenner et al. have determined that 30% identity is a reliable threshold for 
establishing evolutionary homology between two sequences aligned over at least 150 residues. 
(Brenner et al., pages 6073 and 6076.) Furthermore, local identity is particularly important in this case 
for assessing the significance of the alignments, as Brenner et al. further report that ^40% identity over 
at least 70 residues is reliable in signifying homology between proteins. (Brenner et al., page 6076.) 

The present application is directed, inter alia, to myosin proteins related to the amino acid 
sequence of SEQ ED NO:l. In accordance with Brenner et al, naturally occurring molecules may exist 
which could be characterized as myosin proteins and which have as little as 40% identity over at least 
70 residues to SEQ ID NO:l. The "variant language" of the present claims recites, for example, a 
polynucleotide encoding "a polypeptide comprising a naturally occurring amino acid sequence at least 
90% identical to an amino acid sequence of SEQ ID N0:1" and "a polynucleotide comprising a 
naturally occurring polynucleotide sequence at least 70% identical to a polynucleotide sequence of SEQ 
ED NO:2" (note that SEQ ID NO: 1 has 612 amino acid residues). This variation is far less than that of 
all potential myosin proteins related to SEQ ID NO:l, i.e., those myosin proteins having as little as 40% 
identity over at least 70 residues to SEQ ID NO: 1 . 

3. The state of the art at the time of the present invention is further advanced than 
at the time of the Lilly and Fiers applications 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to comply 
with the written description requirement of 35 U.S.C. §112. The '525 patent claimed the benefit of 
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priority of two applications, Application Serial No. 801,343 filed May 27, 1977, and Application Serial 
No. 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the benefit of priority of an 
Israeli application filed on November 21, 1979. Thus, the written description inquiry in those case was 
based on the state of the art at essentially at the "dark ages" of recombinant DNA technology. 

The present application has a priority date of November 5, 1998. Much has happened in the 
development of recombinant DNA technology in the 19 or more years from the time of filing of the 
applications involved in Lilly and Fiers and the present application. For example, the technique of 
polymerase chain reaction (PGR) was invented. Highly efficient cloning and DNA sequencing 
technology has been developed. Large databases of protein and nucleotide sequences have been 
compiled. Much of the raw material of the human and other genomes has been sequenced. With these 
remarkable advances one of skill in the art would recognize that, given the sequence information of 
SEQ ID N0:1 and SEQ ID N0:2, and the additional extensive detail provided by the subject 
application, the present inventors were in possession of the claimed polynucleotide variants at the time 
of filing of this application. 

4. Summary 

The Office Action failed to base its written description inquiry "on whatever is now claimed." 
Consequently, the Action did not provide an appropriate analysis of the present claims and how they 
differ from those found not to satisfy the written description requirement in cases such as Lilly and 
Fiers, In particular, the claims of the subject application are fundamentally different from those found 
invalid in Lilly and Fiers, The subject matter of the present claims is defined in terms of the chemical 
structure of SEQ ID N0:1 or SEQ ID NO:2. The courts have stressed that structural features are 
important factors to consider in a written description analysis of claims to nucleic acids and proteins. In 
addition, the genus of polynucleotides defined by the present claims is adequately described, as 
evidenced by Brenner et al and consideration of the claims of the '740 patent involved in Lilly, 
Furthermore, there have been remarkable advances in the state of the art since the Lilly and Fiers 
cases, and these advances were given no consideration whatsoever in the position set forth by the 
Office Action. 
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Rejection under 35 U.S.C> § 112, second paragraph 

Claims 4 and 5 have been rejected under 35 U.S.C. § 112, second paragraph, as allegedly 
being "indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention" (Office Action, page 6). Claim 5 has been canceled. Therefore, the 
rejection with respect to this claim is moot. 

Original claim 4 was allegedly indefinite because "the specific nucleotide sequence of the 
polynucleotide to which the claimed polynucleotide has 70% identity is not known and not stated in the 
claim." New claims 23 and 30, now replace claim 4. Claim 23 recites an isolated polynucleotide 
encoding a polypeptide comprising a naturally occurring amino acid sequence at least 90% identical to 
an amino acid sequence of SEQ ID NO:l. Claim 30 recites an isolated polynucleotide comprising a 
naturally occurring polynucleotide sequence at least 70% identical to a polynucleotide sequence of SEQ 
ID N0:2. Given the sequences of SEQ ID NO: 1 and SEQ ID NO:2, which are disclosed in the 
instant application, one of skill in the art could readily understand the scope of the claimed invention. 
Therefore, withdrawal of the rejection under 35 U.S.C. § 112, second paragraph is respectfully 
requested. 

Re jection under 35 U.S.C. § 102 

Original claims 3 and 9, now replaced by claims 23 and 31, are rejected under 35 U.S.C. § 
102 as allegedly being anticipated by the references of Calabretta et al. (U.S. Patent No. 5,734,039) 
and Dahlberg et al. (U.S. Patent No. 5,541,311) on the grounds that the references teach the claimed 
polynucleotide fragments. 

As currently pending, claim 23 recites a polynucleotide encoding a biologically active fragment 
of a polypeptide having an amino acid sequence of SEQ ID N0:1, wherein said fragment has ATPase 
activity, and a polynucleotide encoding an immunogenic fragment of a polypeptide consisting of an 
amino acid sequence of SEQ ID N0:1, wherein said fragment comprises at least 15 contiguous amino 
acid residues of SEQ ED N0:1. Claim 31 recites an isolated polynucleotide consisting of at least 25 
contiguous nucleotides of SEQ ED N0:2, or the complement thereof. Support for the new claims can 
be found in the specification, for example, at page 8, lines 36-41, which defines the term "fragment," 
page 17, line 26, through page 18, line 9, which describes the homology between MHCH and myosin, 
and at page 48, lines 5-22, which describes assays for myosin ATPase activity. 
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The polynucleotide sequence disclosed by the Calabretta reference does not encode a 
polypeptide containing 15 contiguous amino acid residues of SEQ ID NO:l, nor a biologically active 
fragment of SEQ E) NO:l having ATPase activity. The polynucleotide sequence disclosed by the 
reference of Dahlberg et al. does not contain 25 contiguous nucleotides of SEQ ID N0:2. Therefore, 
the references do not disclose the claimed polynucleotide fragments, and Applicants respectfully 
request withdrawal of the rejections under 35 U.S.C. § 102. 
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CONCLUSION 

In light of the above amendments and remarks. Applicants submit that the present application is 
fully in condition for allowance, and request that the Examiner withdraw the outstanding rejections. 
Early notice to that effect is earnestly solicited. 

If the Examiner contemplates other action, or if a telephone conference would expedite 
allowance of the claims. Applicants invite the Examiner to contact Applicants' Attorney at 
(650) 855-0555. 

Applicants believe that no fee is due with this communication. However, if the USPTO 
determines that a fee is due, the Commissioner is hereby authorized to charge Deposit Account No. 
09-0108. 

Respectfully submitted, 
INCYTE CORPORATION 

Jenliy Buchbmtier 
Reg. No. 48,58!8 

Direct Dial Telephone: (650) 843-7212 



Date: 




Date: Jo^ ^ ^ H]^ 

^ pcS^athleen M. Rocco' 

Reg. No. 46,172 

Direct Dial Telephone: (650) 845-4587 



Customer No.: 27904 
3160 Porter Drive 
Palo Alto, California 94304 
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Differential gene expression in drug metabolism and 
toxicology: practicaliues, problems and potential 
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1. An important leirurt of the work of rhtny molecular bioiogist* w ideniitymg which 
gcnea are switched on and off in a cell under different environmcnial condnions or 
subsequent to xenobioric challenge. Such intormaiion haj manv u»es. includin? the 
deciphering of molecular paihwayi and facilitaiing the development oi new experimental 
and diagnoahc procedure*. However, the student of gene hunting should be forgiven for 
perhaps becoming confused by the mountain of information avaiUbie as ihere appears to be 

»» many methods of discovering differentially expressed genes as there are research 
groups usmg the technique. 

2. The aim of this review was to clarify the main methods of dirferennal gene expression 
analysis and the mechanistic pnnctples underlying them. Also included is a discussion on 
some of the practical aspects of usmg this technique. Emphasis is placed on the so-called 
'open ' systems, which require no pnor knowledge of the genes contained wuhin the study 
model. Whilst these will eventually be replaced by ' closed ' systems m the study of human, 
mouse and other commonly studied laboraton.* animals, they will remain a powerful tool for 
thoae examining less fashionable models. 

J- "The use of suppression-PCR subiractive hybridiiation is exemplined m the 

identification of up- and down-regulated genes m rat liver following exposure to pheno- 
barbital. a well-known inducer of the drug metabolizing enzymes. 

^ ^- DifJereniial gene display provides a coherent platform for building libraries and 

micrt>chip arrays of *gene nngerpnnts* characteristic of known cniyme inducers and 
xenobtotic toxicants, which may be interrogated subsequentiv tor the identitication and 
charactenxation of xenobiotics of unknown biological properties. 



J»» L'nivcr»»Tv Pms The onlutt 
mmn. ano artuii on aovrrntint 



mui. Eimont. Sr« York 11003. 

>.r* Yon 11003 

pouftfl tirriiAt pnct tppitt*. All 
.wadA. MrtiCD. Indi^ Jip«n*nd 
rruac t>« ttcrurtf cAeput. doUw 



■t Bm No. 8. Smr. N«» EMht 



19106 
PR. 

mar r«v*«4t»c»d. ■■■rid. 

"tot & FrmAci* Limiwd frwin 
rrourtn lor such um ftrt rWrrrvd 
Ufcnc* C»ni»r m Om U'SA. or ttw 



Introduction 

It is now apparent that the development of alnr^ost all cancers and nnany non- 
neoplastic diseases are accompanied by altered gene expression in the affected cells 
com po cd to their TiormaJ state (Hunter 1991. Wynford-Thomas 1991. Vogclstein 
and Kinzicr 1 993 , Semenxa 1 994. Cassidy 1 995 . Kleinjan and Van Hceningen 1 998). 
Such changes also occur in response to exicmal stimuli such as pathogenic micro- 
organisms (Rohn tx al, 1996, Singh tt al, 1997, Griffin and Krishna 1998, Lunncy 
1998) and xenobiotics (Sewall et aL 1995. Dogra ct aL 1998. Ramana and Kohli 
1998), as well as during the development of undifferentiated cells (Hecht 1998, 
Rudin and Thompson 1998, Schneider-Maunoury et aL 1998). The potential 
medical and therapeutic benefits f understanding the molecular changes which 
occur in any given cell in progressing from the normal to the * altered* suie arc 
enormous. Such profiling essentially provides a / fingerprint * of each step of a 
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cell's development or response and should help in rhc elucidation of speciric and 
sensitive biomarkcrs representing, for example, dinerent r^'pes of cancer or previous 
— exposure to certain classes of chemicals that are enzyme mducers. 

In drug metabolism, many of the xenobiotic-metabolizing enzymes (including 
the well-characterized isoforms of cytochrome P450) are mducible bv drugs and 
chemicals in man (Pelkonen et at. 1998). predominantly involving transcriptional 
activation of not only the cognate c>nochrome P450 genes, but aa'ditiona! cellular 
proteins which may be crucial to the phenomenon of induction. Accordmgly, the 
development of methodolog>- to identify and assess the full complement of genes 
that are either up- or down-regulated'by inducers are crucial in the development of 
knowledge to understand the precise molecular mechanisms of enzvme mduciion 
and how this relates to drug action. Similarly. ,n the field of chemical-induced 
toxicity. It is now becoming increasingly obvious that most adverse reactions to 
drugs and chemicals arc the result of multiple gene regulation, some of which are 
causal and some of which are casually-related to the toxicological phenomenon per 
se. This observation has led to an upsurge in interest in gene-protiling technologies 
which differentiate berween the control and toxin-treated gene pools in target tissues 
and IS, therefore, of value m rationalizing the molecular mechanisms of xenobiotic- 
mduced toxicir>-. Knowledge of toxin-dependent gene regulation in target tissues is 
not solely an academic pursuit as much interest has been generated in the 
pharmaceutical mdustry to harness this technology in the earlv identmcaiion of toxic 
drug candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. For example, if the gene profile 
m response to say a testicular toxin that has been well-characterized in t iio could be 
determined in the testis, then this profile would be representative of all new drug 
candidates which act via this specific molecular mechanism of toxicit%-, thereby 
providing a useful and coherent approach to the early detection of such toxicants. 
Whereas it would be informative to know the identity and functionalirv- of all genes 
up/down regulated by such toxicants, this would appear a longer term goal, as the 
majoniy of human genes have not yet been sequenced, far less their functionality 
determined. However, the current use of gene profiling yields a patiem of gene 
changes for a xenobiotic of unknown toxicity which may be matched to that of well- 
characterized toxins, thus alerting the toxicologist to possible in t wo similarities 
bet^^een the unknown and the standard, thereby providing a platform for more 
extewivr toxicological examination. Such approaches are bejnnnmg to gam 
momentum, m that several biotechnology companies are commerciallv producing 
'gene chips* or 'gene arrays' that may be interrogated for toxicit%- assessment of 
xenobioncs. These chips consist of hundreds/thousands of genes, some of which are 
degenerate- in the sense that not all of the genes arc mechanistically- related to any 
one toxicological phenomenon. Whereas these chips are useful in broad -spectrtim 
screenmg, they are maturing at a substantial rate, in that gene arrays are now 
becoining more specific, e.g. chips for the identification of changes in growth factor 
famili s that contribut to the aetiol gy and development of chemically-induc d 
neoplasias. . . . -_ 

Although documenting and explaining~ihese genetic changes presents a 
f rmidable bstade to understanding the difTerent mechanisms of development and 
disease progressi n. the technology is now av.ilablfrto begin attempting this difficult 
challenge. Indeed, several 'differential expression analysis' meth ds have been 
developed which facilitate the identification of gene products that demonstrate 
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altered expression in cells of one population compared to another. These memoes 
have been used to idenrif>' difTcrential gene expression m many situations, mcidainc- - 
invading pathogenic microbes <Zhao et aL 1998), in cells responding to extracellular 
and intracellular microbial invasion (Duguid and Dinauer 1990, Ragno et al. 1997. 
Maldarelli et aL 1998). in chemically treated cells (Syed et al. 1997. Rockett et al. 
1999). neoplastic cells (Liang et aL 1992. Chang and Tcrzaehi-Howe 1998). 
activated cells (Gurskaya et aL 1996. Wan et aL 1996). dmercntiated cells ^Hara et 
aL 1991. Guimaraes et aL 1995a. b), and different cell types ( Davis et al. 1984. 
Hedrick et aL 1984. Xhu et aL 1998). Although difrercntial expression analysis 
technologies are applicable to a broad range of models, perhaps their most important 
advantage is thai, in most cases, 'absolutely no prior knowledge of the spccihc genes 
which are up- or down-regulated is required. 

The field of differential expression analysis is a large and complex one. with 
many techniques available to the potential user. These can be categorized into 
several methodological approaches, including : 

(1) Differential screening, 

(2) Subtractive hybridization (SH) (includes methods such as chemical cross- 
linking subtraction — CCLS. suppression-PCR subtractive hybridization — 
5SH. and representational difference analysis — RDA). 

(3) Differential display (DD), 

(4) Restriction endonuclease facilitated analysis (including serial analysis of gene 
expression — SAGE — and gene expression fingerprinting — GEF), 

(5) Gene expression arrays, and 

(6) Expressed sequence tag (EST) analysis. 

The above approaches have been used successfully to isolate differentially 
expressed genes in different model systems. However, each method has its own 
subtle (and sometimes not so subtle) characteristics which incur various advantages 
and disadvantages. Accordingly, it is the purpose of this review to clarify the 
mechanistic principles underlying the main differential expression methods and to 
highlight some of the broader considerations and implications of this very powerful 
and increasingly popular technique. Specifically, we will concentrate on the so- 
called 'open* systems, namely those which do not require any knowledge of gene 
sequences and, therefore, arc useful for isolating unknown genes. Two 'closed' 
s>-5tcm$ (those utilising previously identified gene sequences). EST analysis and the 
Qsc of DN A arrays, will aisc be cwwidcred briefiy for completeneas. \\"hilst 
emphasis will often be placed on suppression PCR subtractive hybridization (SSH, 
the approach employed in this laboratory), it is the aim of the authors to highlight, 
wherever possible, those areas of common interest to those who use, or intend to use, 
differential gene expression analysis. - - - 



Diff r odal cDNA library screening (DS) 

Despite the development of multiple technological advances which have recently 
brought the fi Id of gene expressi n profiling to the forefront f molecular analysis, 
recognition f the importance of differential g ne expression and characterization of 
differentially expressed genes has existed for many years. One of the original 
appr aches used to identify such genes was d scribed 20 years ago by St John and 
Davis (1979). These auth rs developed a method, termed 'differential plaque filter 
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hybridization", which was used ro isolate galactose- inducible DNA sequences irom 
yeast. The theory is simple: a genomic DNA librarv- is prepared from normal, 
unstimulated cells of the test organism/nssue and multiple filter replicas are 
prepared. These replica blois are probed with radioactively lor othcr^vise) labelled 
complex cDNA probes prepared from the control and test cell mRNA populations. 
Those mRNAs which are differentially expressed in the treated cell population w,ll 
show a positive signal only on the filter probed with cDNA from the treated cells 
Funhermore. labelled cDNA from difTercnt test conditions can be used to probe 
multiple blots, thereby enabling the idenrification of mRNAs which are only up- 
regulated under certain conditions. For example. St John and Davis ( 1 979) screened 
replica filters with acetate-, glucose- and galactose-derived probes m order to obtain 
genes induced specifically by galactose metabolism. Although groundbreaking m its 
time this method is now considered insensitive and time-consuming, as up to 2 
months are required to complete the identification of genes which are difTerentiallv 
expressed in the test population. In addition, there is no convenient wav to check 
that the procedure has worked until the whole process has been completed. 

Subtractive Hybridization (SH) 

The developing concept of differential gene e.xpression and the success of earlv 
approaches such as that described by St John and Davis (1979) soon gave rise to a 
search for more convenient methods of analysis. One of the first to be developed was 
SH. numerous variations of which have since been reported (see below). In general, 
this approach involves hybridization of mRNA/cDNA from one population (tester) 
to excess mRNA/cDNA from another (driver), followed bv separation of the 
unhybridized tester fraction (differentially expressed) from the hvbridized common 
sequences. This step has been achieved physically, chemicallv and through the use 
of selective polj-merase chain reaction (PCR) techniques. 

Physical separation 

Original subtractive hybridization technology involved the physical separation 
of hybridized common species from unique single stranded species. Several methods 
of acTiicTOig tlus have: betai described, including hydroiyapante chromatographv 
(Sargent and Dawid 1983), avidin-biorm technology (Duguid and Dmauer 1990) 
and ohgodT-latex separation (Hara et ai. 1991). In the first approach, common 
mRNA species are removed by cDNA (from test cells)-mRNA (from control cells) 
subtractive hybridization foUowed by hydroxyapatite chromatographv, as hydroxy, 
apatite spccificaUy adsorbs the cDNA-mRNA hybrids. The unabsorbed cDNA is 
then used either for the construction of a cDNA library of differentially expressed 
genes (Sargent and Dawid 1983. Schneider et'il. 1988) or directly as a pr be to 
screen a preselect d library (Zimmerman et o/. 1980, Davis et at. 1 984. Hedrick et al. 
1 984). A schematic diagram of ih procedure is shown in figure 1 . 

Less rigorous physical separation procedures c upled with sensitivity enhancing 
PCR steps were later developed as a means to overcome some of the problems 
encountered with the hydr xyapatite pr cedure. For example, Daguid and Dinauer 
(1990) described a method of subtraction utilizing biotin-afiiniry systems as a mean* 
to remove hybridized common sequences. In this process, both the control and 
tester mRNA populations are first convened to cDNA and an adaptor (' oligovector *, 
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Produce dones Late! directty and probe library 

Fifure 1. The hydrDzyaptnic method of subtractive hybridization. cDNA dcnvcd from the 
treated / tliertd (tester) popuUnon la mixed with a larRe excns oi mRN.A from the control idnver» 
p>opuisrKm. FoUowm^ hvbndixaoas. mRNA-cDNA hvfonda are removed bv h^*^i^oT^•»pa^le 
chromatognphy. The oniy cDNA* which remaxn are thoie which arc dinereniiaiiv expressed in 
the treated /altered popuianon. In order to tacihtxte the recovery* ot full length clones, small cONA 
fragments are removed by exclusion chromatography. The remaining cDN As are then cloned into 
a vector for sequencing, or labelled and used dircnlv to probe a librarv. as descnbcd bv Sargent 
and Dawid (1983). 

containing a restriction site) ligated to both sides. Both populations are th n 
amplified by PCR, but the driver cDNA population is subsequently digested with 
the adaptor-containing restriction cndonuclease. This serves to cleave the oligo* 
vector and reduce the amplification potential of the control population. The digested 
control population is then biotinylatcd and an excess mixed with tester cDNA, 
Following denaturation and hybridization, th mix is applied to a biocytin column 
(streptavidin may also be used) to remove the "control populaii n, including 
heteroduplexes formed by annealing of common sequences from the tester 
population. The procedure is repeated several times following the additi n of fresh 
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control cDNA. In order lo lunher cnnch those species dmerentiaily expressed 'ir. 
the tester cDNA. the subtracted lesier population is amplined by PCR toUowmg 
cverv* second subtraction cycle. After six cycles of subtraction i three reamplihcation 
steps) the reaction mix is iigated into a vector for funher analysis. 

In a slightly different approach. Hara et al. (1991) utilized a method whereby 
oligoldTjo) primers attached to a latex substrate are used to hrst capture mRNA 
extracted from the control population. Following 1st strand cDNA synthesis, the 
RNA strand of the hetcroduplexes is removed by heat dcnarurarion and centri- 
fugation (the cDNA-oligotex-dT„ forms a pellet and the supernatant is removed). 
A quantity of tester mRNA is then repeatedly hybridized to the immobilized control 
(driver) cDNA (which is present in 20-fold excess). After several rounds of 
hybridization the only mRNA molecules left m the tester mRNA population are 
those which arc not found^in the driver cDNA-oligotex-dTjo population. These 
tester-specific mRNA species arc then convened to cDNA and. followmg the 
addition of adaptor sequences, amplined by PCR. The PCR products are then 
ligated into a vector for further analysis using restriction sues incorporated into the 
PGR primers. A schematic illustration of this subtraction process is shown in ngure 
2. 

However, all these methods utilising physical separation have been described as 
inefficient due to the requirement for large staning amounts of mRNA. significant 
loss of material during the separation process and a need for several rounds of 
hybridization. Hence, new methods of differential expression analysis have recently 
been designed to eliminate these problems. 



itneved atlef 



A extracted from the 
dT oligonucleotide* 
jiation i> repeatedly 
>ulaiion of mRNA is 
:reajn applicationa. a» 



Chemical Cross- Linking Subtraction (CCLS) 

In this technique, originally described by Hampson et al. (1992). driver mRNA 
is mixed with tester cDNA (1st strand only) in a ratio of > 20:1. The common 
sequences form cDNA:mRNA hybrids, leaving the tester specific species as single 
stranded cDNA. Instead of physically separating these hybrids, they are inactivated 
chemically using 2.5 diaziridinyl- 1 ,4-bcnzoquinone (DZQ). Labelled probes are 
then synthesized from the remaining single stranded cDNA species (unrcactcd 
mRNA species remaining from the driver are not converted into probe material due 
to specificir\' of Sequenase T7 DNA pol>*merase used to make the probe) and used 
to screens cDNA library made frum the wflter cell population. A schcmanc diagram 
01 the system is shoun m ngure 3 . 

It has been shown that the differentially expressed sequences can be enriched at 
least 300-fold with one round of subtraction (Hampson et aL 1992). and that the 
technique should allow isolation of cDNAs derived from transcripts that arc present 
at less than 50 copies per cell. This equates to genes at the low end of intermediate 
abundance (see table 1). The main advantages of the CCLS approach are that it is 
rapid, technically simpl and also produces fewer false positives than other 
differential expression analysis methods. However, like the physical separation 
protoc Is, a major drawback with CCLS is the large amount of starting material 
required (at least 10 //g RNA). Consequently, the technique has recently been 
refined s that a renewable source of RNA can be generated. The degenerate random 
oligonucleotide prim d (DROP) adaptari n (Hampson et aL 1996. Hampson and 
Hampson 1997) uses random hexanucleotide sequences to prime solid phase- 
synthesized cDN A. Since each primer includes a T7 p lymerasepr motor sequence 
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Figure 5^ ChcnjcBj cro«.linkin« .ubtr«^on. Excrw driver mRNA i, mixed unih 1" .trand tcter 
zUISA. The common taqueacc* lorm mR^A:cDXA hvbnd* wtiacn art ot« Linkrd %nih " - 
di«rm(faTT*.t.4^beTiroquinone iD2Q) imd rhr rrmammg cDNA «i«nc« « dinrTTna*JJv 
expfT«ed m the tesxer population. Probe* are made from the»e sequence* uimg Srquena.e 2.0 
UINA polymery. wKich lacks reverse transcnptaae acnviry and. therefore, doci not react wtih the 
remammg mRNA rnolecuJes from the driver. The labelled probe, «e then u»ed to screen a cDN A 
library for clones of differentitllv expre.»ed sequence. Adapted from W.lter « o/. ( 1 9%) w.th 



Table 1 . The abundance of mRNA species and classes m s ryiTical mammalian cell. 

Mean mass 

Copies of. No. of rpRNA^Mean i^of (ng) of each 
mRNA each ipeoes m each speciea species/pg 
«pecic«/cell class m class total RNA 

Abundant 12000 4 3.j 1.65 

Intermediate 300 500 008 * 0 04 

- IS nOOO 0.004 0^002 



- Modiftcd from BmioU r< al. (1995>. ~ 
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at the 5' end. the final pool of random cDNA-fraements is a PCR-renewabic cDNA 
population which is representative of the expressed gene pool and can be used to 
synthesize sense RNA for use as driver material. Furthermore, if the final pool of 
random cDNA fragments is reamplined using biotmyUted TT primer and random 
^ hc.xamer. the product can be captured with streptavidin beads and the antisense 
strand eluied for use as tester. Since both target and driver can be eencrated from 
the same DROP product, subtraction can be performed in both directions (i.e. for 
up- and down-reguiated species) ben^ecn ruo different DROP products. 

Representational Difference Analysis ^RDA) 

RDA of cDNA (Hubank and Schati 1994) is an extension of the technique 
originally applied to genomic DNA as a means of identifying differences between 
rwo complex genomes (Lisitsyn et aL 1993). It is a process of subtraction and 
amplification involvir^g subtractivc hybridization of the tester in the presence of 
excess driver. Sequences in the tester that have homologues m the driver arc 
rendered unamplifiable. whereas those genes expressed only in the tester retain the 
abilir\- to be amplified by PGR. The procedure is shown schematically m figure 4, 

In essence, the driver and tester mRNA populations are first converted to cDN A 
and amplified by PGR following the ligation of an adaptor. The adaptors are then 
removed from both populations and a new (different) adaptor iigatcd to the 
amplified tester population only. Dnver and tester populations are next melted and 
hybridized together in a ratio of 100:1. Following hybridization, only testcntcstcr 
homohybrids have 5' adaptors at each end of the DNA duplex and can. thus, be filled 
in at both 3' ends. Hence, only these molecules are amplified exponentially during 
the subsequent PGR step. Although tester : driver heterohybrids are present, they 
only amplify in a linear fashion, since the strand derived from the driver has no 
adaptor to which the primer can bind. Driver: driver heterohybrids have n 
adaptors and, therefore, are not amplified. Single stranded molecules are digested 
with mung bean nuclease before a further PCR-cnrichment of the tester :.testcr 
homohybrids. The adaptors on the amplified tester population are then replaced and 
the whole process repeated a further rwo or three times using an increasing excess f 
driver (Hubank and Shatz used a tester: driver ratio of r.4(X). 1:80000 and 
1 : 800000 for the second, third and founh hybridizations, respectively). Different 
adaptors are ligated to the tester ber^vecn successive rounds of hvbridization and 
amplificarion to prevent the accumulation of PGR products that might mterl'ere with 
subsequent amplifications. The final display is a scncs of difTerentially expressed 
gene products easily obser^-able on an ethidium bromide gel. 

X^c main advantages of RDA are that it offers a reproducible and sensitive 

approach to the analysis of differentially expressed genes. H ubank and Schatz ( 1 994) 
reponed that they were able to isolate genes that were differentially expressed in 
substantially less than 1 % of the cells from which the tester is derived. Perhaps the 
main drawback is that multiple rounds of ligation, hybridization, amplifiation and 
digestion ar required. The procedure is. therefore, lengthier than many ther 
differ ntial display approaches and provides m re pportuniry for operator-induced 
error to occur! Alth ugh the gcnefatfon of false' positives has been noted, this has 
been solved'to some degree by O'Neill and Sinclair (1997) through the use f HPLC- 
purified adaptors. These are free of tRe truncated adaptors which appear to be a 
major source of the false positive bands. A very similar technique to RDA, termed 
linker capture subtracti n (LGS) was described by Yang and Syxowski (1996). 
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Figure 4. The ^rprt^enunonlJ difference anilyiU (RDA) technique. Driver and t«ier cDNA m 
difetted with % 4-cuner romction enirme tuch u DjmW, The 1" set of 12/24 adaptor tmnds 
(oligonudeondes) are ligmted to each other and rhe digested cDNA producti The I2mer U 
•ubwquently melted .w.y and the 3'end« filled in using Taq DNA polymeraae Each cDNA 
popultoon 11 then amplified using PGR, following which the I- set of adaptor* is removed with 
DtmW, A second set of 12/24 adaptor strands is then added to the amplified tester cDNA 
populauon. after which the tester U hybridiied agalnst"a Ui^xc^ of driver. The 12mer 
sdapton are melted and the 3' ends filled m as before. PCR is earned out wuh primers identical 
to the new 24mer adaptor. Thus, the only hybridization products which are exponentially 
amplified arc those which are tester: tester combinational Following PGR. jsDNA products arc 
removed with mung bean nucleaae. leavmg the 'first difference product'. This is digested and a 
thud set of 12/24 adaptors added before repesring the subtraction process from the hybridization 
m'cS',; ! ?rt"* repeated to the 3* or 4<» difTerrncc produa, as described by Lisiuyn nal, 
(1993) and Hubank and Schatz (1994). .--^z^zz^ . . 
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Suppression PCR Subir active Hybridization i SSH ^ 

The most recent adaptation of the 5H approach to differential expression 
analysis was first described by Diatchcnko et al. (1996) and Gurskaya et al, \ 199ot 
They reported that a 1000-5000 fold enrichment of rare cDNAs tequivalent to 
isolating mRNAs present at only a few, copies per cell) can be obtained without the 
need for multiple hybridizations/subtractions. Instead of physical or chemical 
removal of the common sequences, a PCR-bascd suppression system is used isee 
hgure 5). 

In SSH. excess driver cDNA is added t u mo ponions of the tester cDNA which 
have been ligatcd with different adaptors. A first round of hybridization serves to 
ennch difTerennally expressed genes and equalize rare and abundant messages. 
Equalization occurs since reannealing is more rapid for abundant molecules than for 
rarer molecules due to the second order kinetics of hybridization i James and Higgms 
1 985). The rwo primary hybridization mixes are then mixed together in the presence 
of excess driver and allowed to hybridize further. This step permits the annealing of 
single stranded complemcntar>' sequences which did not hybridize m the primar>' 
hybridization, and in doing so generates templates for PCR amplification. Although 
there are several possible combinations of the single stranded molecules present in 
the secondarx' hybridization mix, only one particular combination (differentiallv 
expressed in the tester cDNA composed of complimentary strands having different 
adaptors) can amplify exponentially. 

Having obtained the final differential display, rwo options are available if cloning 
of cDNAs is desired. One is to transform the whole of the final PCR reaction into 
competent cells. Transformed colonics can then be isolated and their inserts 
characterized by sequencing, restriction analysis or PCR. Alternatively, the final 
PCR products can be resolved on a gel and the individual bands excised, reamplified 
and cloned. The first approach is technically simpler and less time consuming. 
However, ligation/transformation reactions are known to be biased towards the 
cloning of smaller molecules, and so the final population of clones will probably not 
contain a representative selection of the larger products. In addition, although 
equalization theoretically occurs, observations in this laboraror>* suggest that this is 
by no means perfectly accomplished. Consequently, some gene species are present 
in a higher number than others and this will be represented in the final population 
of clones. Thus, in order to obtain a substantial proportion of those gene species that 
acTuaily demonstrate diffcrcnnaJ c^cprrssion-in the tester populanon. the number oi 
clones that will have to be screened after this step may be substantial. The second 
approach is initially more lime consuming and technically demanding. However, it 
would appear to offer bener prospects for cloning larger and low abundance gel 
products. In addition, one can incorporate a screening step that differentiates 
different products of different sequences but of the same size (HA-staining, see 
later). In this way, a good idea of the final number of clones to be isolated and 
identified can be achieved. 

An alicmarivc (or even complementary) approaches to use the final differential 
display reaction to screen a cDNA library to isolate full length clones for further 
characterization, or a DNA array (see later) t quickly identify known genes. SSH 
has been used in this lab ratory to begin characterization of the shon-tcrm gene 
expression profiles of enzyme-inducers such as phenobarbital (Rocken et al. 1997) 
and Wy. 14,643 (Rock tt et aL unpublished observations). The isolation of 
differentially express d genes in this manner enables the construction of a fingerprint 
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for between 3 and 8 h. This aervet two purposes : (1 ) to equalize rare and abundant molecules ; and 
(2) to enrich for difTerrntially expressed sequences— cDN A*. t hit art not difTertniially expressed 
form type c molecules with the driver. In the secondary hybridization, the two primary 
hybridizations are mixed together without denaturing. Fresh densrured driver can aJto be added 
at this point to allow further enrichment of differcniially expressed sequences. Type c molecules 
are formed in this secondary hybridization which arc subsequently amplified using two rounds of 
PCR. The ftnst producu can be visualized on an sgargse gel^labelled directly or cloned into a 
vector for downstream manipulatioo. As described by Distchcnko et al, (1996) and Gurmkaya 
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of expressed genes which arc unique to each compound and limc/dosc point. Such 
information could be useful in shon-tcrm characterization of the toxic potential of 
new compounds by comparing the gene-expression profiles they elicit with those 
produced by known inducers. Figure 6 sh ws a flow diagram of the method used to 
isolate, verify and clone differentially expressed genes, and figure 7 shows expr ssion 
pr files obtained f r m a typical SSH experiment. Subsequent sub-cloning of the 
individual bands, sequencing and gene data base interrogation reveals many genes 
which are either up- or down- regulated by phenobarbiial in the rat (tables 2 and 3). 

One of the advantages in using the SSH approach is that no prior knowledge is 
required of which specific genes are up/down-regulatcd subsequent to x nobiotic 
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Figure T. SSH display panemi obtained from rii liver following 3-day :reaimenr with U"^'- 1 •^.643 or 
phenobarbital. mRNA extracted from control and treated Incrs was used to generate the 
dinereniial displays using the PCR-Selecr cDNA subtraction kit (Clontechi. Lane: 1 — 1 kb 
ladder; 2 — genes upregulated toUowing WyJ J— 643 treatment: 3 — genes downregulaied following 
\Vy,l4— 643 treatment: — genes upre^ated following phenobarbital treatment: 5 — genes 
downregulated following phenobarbital treatment: 6 — Ikb ladder. Reproduced from Rocken et 
ai. ( 1997), with permission. 

exposure, and an almost complete complement of genes are obtained. For example, 
the peroxisome proliferator and nbn-genotoxic hepaiocarcinogen \Vy, 14,643, up- 
rcgulates at least 28 genes and down-regulates at least 15 in the rat (a sensitive 
species) and produces 48 up- and 37 down-regulated genes in the guinea pig, a 
resistant species (Rocken, Swales. Esda and Gibson, unpublished obscr^-aiions). 
One of these genes, CD81, was up-regulated in the rat and down-regulated in the 
guinea pig following Wy-14,643 treatment. CD81 (alternatively named TAPA-1) is 
a widely expressed cell surface proiem which is involved in a large number of cellular 
processes including adhesion, activation, proliferation and differentiation (Le\'>' et 
al. 1998). Since all of these functions are altered to some extent m the phenomena 
oT hcparromegaly and non-gctioioxic hcpatocarcinogcnesis. it is intngumg. and 
probably mechanisricaJly.relevani. that CD81 expression is dincrcntially regulated 
in a resistant and susceptible species. However, the down-side of this approach is 
that the majority of genes can be sequenced and matched to database sequences, but 
the lancr are predominantly expressed sequence tags or genes of completely 
unknown function, thus partially obscuring a realistic overall assessment of the 
critical genes of gcnuirie biological interest^ N orcvi thstanding the lack of complete 
funri nal identification of altered gene expression, such gene profiling studies 
essentially provides a 'molecular fingerprint' in response to xenobiotic challenge, 
thereby serving as a mechanistically-relevant platform for further detailed 
investigations. ^ 

DifrereatiaJ Display (DD) - ~ 

Originally described as * RNA fiiigerprinting by. arbitrarily primed PCR * (Liang 
and Pardee 1992) this method is now more commonly referred to as 'differential 



^tnt with UT- 14.643 or 
5 used to jeneraie the 
lontech). Lane: 1 — Ikb 
iownregulaied tollowmg 
aJ treatment: 5 — genes 
•oduced from Rockett et 

aincd. For example, 
;cn \Vy, 14.643. up- 
thc rat (a sensitive 
n the guinea pig. a 
shed obscnaiions). 
A'n-regulared in the 
named TAP.\-1) is 
c number or cellular 
rrentiation (Levy et 
I in the phenomena 

is intriguing, and 
crcntiaJiy regulated 

of this approach is 
base sequences, but 
jnes of completely 
I assessment of the 
he lack of complete 
le profiling studies 
rnobiotic challenge, 
r further detailed 



rimed PCR* (Liang 
d to as * differential 



Dijjerential gene expression 



Tabic 2. Gents up-reguiited in rat liver loilowmg j-djv exposure to pnenooar Dita. 



Band number 






(approaimau 


Highest sequence 




size in bp) 


simiianr>' 


FA5TA-EMBL ffene loentincation 


5 {1jOO> 


93.5 ^ 


CVP2B1 


: { 1 000) 


95.1 % 


PrepToalbumm 






Serum albumin mRN.A 


8 (950) 


98.3% 


NCI-CC.AP.Prl H lapirnjiEST^ 


iO(S50) 


95.7% 


CVP2BJ 


11 (800) 


Clone 1 94.9% 


CYP2B1 




Clone 2 75.3% 


CYP2B2 


i: (750) 


93.3% 


TRPM.2 mRNA 






Sulfated glycoprotein 


15 (600) 


92.9% 


Preproalbumm 






Serum albumin mRN.^ 


16(55) 


Clone I 95.2% 


CYP2B1 




Clone 2 93.6% 


Haptoglobulin mRN.A partial alpha 


21 (350) 


99.3% 


18S. 5.8S i :8S rRNa 



Bands 1-4. 6, 9. 13. 14. and 17-20 are shown to be false positives by dot blot anayisis and. therctore. 
are not sequenced. Derived from Rockcn tt al. (1997). It should be noted that the above genes do not 
represent the complete spectrum oi genes which are up-reguiated in rat liver "by phcnobarbitai. but 
simply represenu the genes sequenced and idcnimed to date. 



Table 3. Genes down-regulated in rat liver tollowmg 3-day exposure to phenobarbital. 



Band number 

(approximate Highest sequence 

stte in bp) similann' F.\STA-EMBL gene identincation 



1 (1500) 




95.3% 


3-oxoacyl-CoA thiolase 


2 (1200) 




92.3*0 


Hemopoxin mRN.A 


3 (1000) 




91.7% 


.•\lphi-2u-globulin mRNA 


7 (700) 


Clone 1 


77.2% 


M.muirulus CI inhibitor 




Clone 2 


94.5% 


Electron transfer rtavoprotein 




Clone 3 


91.0% 


.\f, musculuj Topoisomerase 1 (Topo 1 ) 


8 (650) 


Clone 1 


S6.9<'o 


Soares 2Nb.\IT M muscuius (EST) 




Clone 2 


96.2% 


Alpha-2u-globulin ts-type) mRNA 


9 (600) 


Clone I 


86.9«>o 


Soares mouse NML M. muscuius (EST) 




Clone 2 


82.0% 


Soares p3NMF 19.5 A/, muscuius (EST) 


10 (550) 




73.3% 


Soares mouse NML A/, muscuius (EST^ 


11 (525) 




95-7% 


NCl-CG.AP-Prl H. saptens (EST) 


12 (375) 




lOCO-*. 


RibosomaJ protein 


13 (23) 


Clone I 


97.2^, 


Scam mou»e embrv-o NbMElj5 (EST^ 




Clone 2 


100.0% 


Fibrinogen B-beta-cnam 




Clone 3 


100.0% 


.\po lipoprotein E gene 


14(170) 




96.0% 


Soares p3NMFl9.5 M. muscuius (EST) 


15(140) 




97.3% 


Stratagcne mouse testis (EST) 


Othen: (300) 




96.7 *>; 


R. nofx^tfieus R.ASP I mRNA 


(275) 




93.1% 


Soares mouse mammary gland (EST) 



EST - Expretaed sequence tag. Bands 4—6 were shown to be false positives by dot blot anaJysis and» 
therefore, were not sequenced. Derived from Rockett rt a/. (1997). It should be noted that the above genet 
do not represent the complete spectrum of genes which are down-regrulated in rat liver by phenobarbital, 
but sirmply rcpreacnu the genei sequenced and idrniified to date. 



display* (DD), In this method, all the mRNA species in the control and treated cell 
populations are amplified in separate reacti ns using reverse transcriptase-PCR 
(RT-PCR). The pr ducts are then run side-by-side on sequencing gels. Those 
bands which are present in one display only, of- which are much m re intense in one 
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display compared to the other, arc differentially expressed and may be reco\ erec :or 
further characterization. One advantage of this system is the speed with which it car. 
be carried out — 2 days to obtain a display and as linlc as a week to make and ident::y 
clones. 

Two commonly used variations arc based on different methods of priming the 
reverse transcription step (figure 8). One is to use an oligo dT with a 2-base * anchor ' 
at the 3'-cnd, e.g. 5* (dT,.)CA 3' (Liang and Pardee 1992). Ah ernativelv. an 
arbitrary* primer may be used for 1st strand cDNA synthesis (Welsh e: al. 1992). 
This variant of RNA fingerprintirig has also been called 'RAP* iRNA Arbitrarily 
Primcd)-PCR. One advantage of this second approach is that PCR products max- be 
derived from an>-\\'here in the RNA, including open reading frames. In addition, it 
can be used for mRNAs that arc not polyadenylated. such as many bacterial mRNAs 
(Wong and McClelland 1994). In both cases, followmg reverse transcription and" 
denaturation, second strand cDNA synthesis is carried out with an arbitrary primer 
{arbitrary pnmers have a single base at each position, as compared to random 
primers, which contain a mixture of all four bases at each position). The resulting 
PCR. thus, produces a scries of products which, depending on the system (primer 
length and composition, polymerase and gel system), usually includes 50-100 
products per primer set (Band and Sager 1989). When a combination of different 
dT-anchors and arbitrary- primers arc used, almost all mRNA species from a cell can 
be amplified. When the cDNA products from two different populations are analysed 
side by side on a poiyacrylamide gcK differences in expression can be identified and 
the appropriate bands recovered for cloning and further analysis. 

Although DD is perhaps the most popular approach used today for identifying 
differentially expressed genes, it docs suffer from several perceived disadvantages: 

(1) It may have a strong bias towards high copy number mRNAs (Berrioli et al. 
1995). although this has been disputed (Wan et al. 1996) and the isolation of verv- 
low abundance genes may be achieved in cenain circumstances (Guimeraes et 
fl/.^ 1995a). 

(2) The cDN.As obtained often only represent the extreme 3' end of the mRNA 
(often the 3'-untranslatcd region), although this may not always be the case 
(Guimeraes et al. 1995a). Since the 3' end is often not included m Genbank and 
shows variation between organisms. cDNAs identified by DD cannot always be 
matched with their genes, even if they have been identified. 

i3) T^e pattern of differential expression seen on the display often cannot be 
reproduced on Norchcm blots, with false positives arising in up to 70°o of cases 
(Sun et al. 1994). Some adaptations have been shown to reduce false positives, 
including the use of rwo reverse transcriptases (Sung and Denman 1997), 
comparison of uninduced and induced celts over a time course (Bum et al. 1994) 
and comparison of DDPCR-producu from rwo uninduced and two induced 
lines (Sompayrac et al. 1995). The laner authors also reported that the use of 
cyt plasmic RNA rather then total RNA reduces false positives arising from 
nuclear RNA that is not transponcd t the cytoplasm. 

Funhcr details of the background, strengths and weaknesses of the DD 
techniquc'can be obtaincd'fixtm a rcvlewT>y McClelland et al. (1996) and from 
articles by Liang et a/. (1995) and Wan ^/ al. (1996)7' " 
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cDNA can now be ampiified by PCR using ongmal pnmer oair 

Fi(rure 8 Two .ppro.ch« to differenual display (DD) analy.u. 1" strand svnthcsi. can be earned out 
either wnh a poIydT„NN prnner (where N . C or Al or wuh an .rbitran- pr.mer. The u.e ol 

ofl7l^'r^'TT l^f "^"^ *V° """^ P^*>-^"^ pr.mer enables the pnm.n« 

D .^«^nr^° PoJyadenylated mRNAs. Arb.trar>* primer, may hvbnd.ze « none, one or more 
places aJong the length of the mRNA. allowmg 1" strand cDNA svnthesi, to occur at none one 
or more pent, .n the same gene. In both cases. 2^ strand synthesis „ earned out wuh an arbiir.rx- 
pnmer. Smce the« .rb,trar>- pnmen for the strand may also hvbnd.ze to the I" strand cDNA 
h.nJ!^^'r.ff'^""^' "^^'T »everaJ different strand products may be obtained from one 
^. ^Id^rr ? R Following strand synthesis, the onstmai set of pnmer, 

' ""^"^ Producu. with the result that numerou, Rene sequences are 

Restriction cndonuclcasc.facilitatcd analysis of gene expression 

Senal Analysis of Gene Expression (SAGE ) 

A more recent development in the field of differential displav is SAGE analvsis 
(Velculescu et al. 1995). This method uses a different approach to those discussed s 
far and is has d n t%vo principles. Firstly, in m re than 95% of cases, short 
nocieotide sequences ('tags-*) of^only nin or 10 base pairs provide suffici m 
mformanon to identify their gene of origin. Secondly, c ncatonati n (linking 
together m a scnes) of these tags allows sequencing of multiple cDNAs within a 
smgle clone. Figure 9 shows a schematic representation of the SAGE process. In this 
procedure, double stranded cDNA from the test cells is svnthesized with a 
P^'y^"^ primcr.^Following^estion with a commonly cuning (4bp 
recognition sequence) restriction enzyme f. anchoring enzyme*), the 3' ends of the 
cDN A population arc captured with wrcptavidin beads. The captured p pulation it 
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split into rv^-o and different adaptors Ugaied to the 5 ' ends of each group. Incorporated 
into the adaptors is a recognition sequence for a rv*pe 115 restriction enrymc — one 
which- cuts DNA at a defined distance (< 20 bp) from its recognition sequence. 
Hence, following digestion of each captured cDN A population wuh the 115 enzyme, 
the adaptors plus a shon piece of the captured cDNA arc released. The r^^-o 
populations are then ligaicd and the products amplified. The ampiincd products are 
cleaved with the original anchoring enxymc, rciigated (ccncaromers are formed in 
the process) and cloned. The advantage of this system is that hundreds of gene tags 
can be identified by sequencing only a few clones. Funhermore. the number of times 
a given transcript is identified is a quantitative measurement of that gene's 
abundance in the original population, a feature which facilitates identification of 
differentially expressed genes in different cell populations. 

Some disadvantages of SAGE analysis include the technical difficulty of the 
method, a large amount of accurate sequencing is required, biased towards abundant 
mRNAs. has not been validated m the pharmaco/toxicogcnomic setting and has 
only been used to examine well known tissue differences to date. 

Gene Expression Fingerprinting (GEF ) 

A different capture/restriction digest approach for isolating differentially 
expressed genes has been described by Ivanova and Belyavsky (1995), In this 
method, RNA is convened to cDNA using biotinylated oligo(dT) primers. The 
cDNA population is then digested with a specific endonuclease and captured with 
magnetic streptavidin microbeads to facilitate removal of the unwanted 5' digestion 
products. The use of restricted 3'-ends alone serves to reduce the complexity of the 
cDNA fragment pool and helps to ensure that each RNA species is represented by 
not more than one restriction product. An adaptor is ligated to facilitate subsequent 
amplification of the captured population. PCR is carried out with one adaptor- 
specific and one biotinylated polydT primer. The reampiified population is 
recaptured and the non-biotinylated strands removed by alkaline dissociation. The 
non-biotinylaied strand is then resynthesized using a different adaptor-specific 
primer in the presence of a radiolabelled dNTP. The labelled immobilized 3' cDNA 
ends are next sequentially treated with a scries of different restriction cndonucleases 
and the products from each digestion anaiyscd by PAGE. The result is a fingerprint 
composed of a number of ladders t^equal to the number of scquenual digests used). 
By comparing test versus control fingerprints, it is possible to identif\' differentially 
expressed products which can then be isolated from the gel and cloned. The 
advantages of this procedure are that it is very robust and reproducible, and the 
author* estimate that 80-93% of cDNA molecules are involved in the final 
fingerprint. The disadvantage is that polyacrylamide gels can rarely resolve more 
than 300-400 bandsr which compares poorly t "the 1000 or more which arc 
estimated to be produced in- an average experiment. The use of 2-D gels such as 
those described by Uinerlindcn et qL (1989) and Hatada et a/. (1991) may help to 
overcome this problem. 

A similar method f r displaying restriction endonuclease fragments was later 
described, by Prashar_aji.d.]Wcissman (1 996): Hojweve r, instead of sequential 
digestion of the immobolized 3 ' -terminal .cDN A fragments, these authors simply 
compared the profiles o£. the control and Treated—populations without further 
manipulation. 
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— CATGXXXXXXXXXOOOOOOOOOCATG XXXXXXXXXOOOOOOOOOCATG— 
— GTACXXXXXXXXXOOOOOOOOOGTAC XXXXX3UXX0000O0OO0GTAC— 



Tag 1 Tag 2 



Tag 3 Tag 4 

f ^P*^ »nalyii» of gene expre»tion (SAGE) analy»i». cDNA it cleaved with am anchoring enzyme 

(AE) and the 3*mda captured uaing strepiavidin beadi. The'cDNA pool is divided in half and each 
portion Hgated to a different linker, each containing a type IIS restriction aite (lagging enzyme, 
TE). Rcatricnon wtih the type US enzyme releaaea the linker plui a ahort length of cDNA 
(XXXXX and OOOOO indicate nucleotides of different uga). The two pooli of laga arc then 
ligated and amplified using linker-specific primers. Following PGR. the products are cleaved with 
— - AE and thedifllff taoTaied from the linkera uaing PACE. The diiag^ arr then ligated (during 
which proceaa. concatcnizaoon occurs) and cloned into a vector of choice for sequencing. After 
VeJculcacu it al. (1995)» with pcimiaaion.* ' ._ 
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DNA arrays 

-Open ■ differential display systems are cumbersome in that it takes a great deal 
of time to extract and identify candidate genes and then confirm that thev are indeed 
up- or down-regulated in the treated compared to the control tissue. Normallv the 
latter process is carried out using Nonhern blotting or RT-PCR. Even so each of 
the aforementioned steps produce a bottleneck to the ultimate goal of rapid analvs.s 

L cXdHVA'"- ^'^'^""^'^ development of 

so-called DN A arrays (e.g. Gress et al. 1992. Zhao ct al. 1995. Schena e: al 199o) 

^IkT D vT" °^ "'"^ '''' d'ff— tul gene e.xpress.on 

analysis^ DNA arrays consist of a" gndded membrane or glass chips' contam.ng 
hundreds or thousands of DNA spots, each consisting of multiple copies of part of 
a known gene. The genes are often selected based on previouslv proven involvement 
m oncogenesis cell cycling. DNA repair, development and other cellular processes 
They are usually chosen to be as speafic as possible for each gene and ammal spec.es" 
Human and mouse arrays are already commercially available and a few companies 

Re ea°rch r " ' P^r'^!' " "^'"P'^ ^'-'-^ Laboratories and 

Research Genetics Inc. The technique is rapid m that hundreds or even thousands 
of genes can be spotted on a single array, and that mRNA/cDNA from the test 
populations can be labelled and used directly as probe. When analvsed with 
ppropriate hardware and software, arrays offer a rapid and quantitative means to 
cTn or^fr'","' «pression bet^veen two cell populations. Of course, there 

can only be identification and quantitation of those genes which are in the arrav 
O^ence the tetrn closed" system). Therefore, one approach to elucidatmg the 
molecular mechanisms involved in a particular disease/development svstem mav be 
to combine an open and closed system-a DNA array to d.rectlv' identifv and 
quantitate the expression of known genes in mRNA populations: and an' open 
system such as SSH to isolate unknown genes which are d.fferentiallv expressed 
whi?h ^ ' advantages of DNA arrays is the huge number of gene fragments 
TobS snotc?"' - n^embran^ome companies have reported gr.dding up to 
60000 spots on a smg e glass "chip' (microscope slide). These high dens.rv chip- 
based micro-arrays will probably become available as mass-produced off-the-shelf 
Items m the near future. This should facilitate the more rapid determination of 
differential "pression in nme and dose-response experiments. Aside from their 
hign cost^and the technical complexities mvolved m producing and probing DN\ 
arrays, ihe mam problem which remains, cspeciallv with the newer nucro-arrav 
lgene-ch.p, technologies, is that results are often not whoUv reproducible between 
arrays. However, this problem is being addressed and should be resolved within the 



EST databases as a m ans t identify difTereniially. expressed genes 

cD^'"l'^''* '^''"r" ^^^"^ "'^ sequences of clones obtained from 

cDNA libranes. Even th ugh most ESTs have no formal identity (putative 
.denttfication .s the best to be hoped for), they have proven to be a rapid and efficient 
means of discovering new genes and can b,rused to generate profiles f gene- 
expre«K,n-,n specific cells. Since they'were first dwcribed bv Adams er al (199n 
there has been a huge explosion in EST production and it is estimated that there ar 
now well over a million such sequences in the public domain, representing over half 
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of all human genes (Hillicr et al. 1996). This large number of rreeiy avaiiabic 
sequences (both sequence information and clones are normally available royalrv -tree 
from the originators) has enabled the development of a new approach towards 
differentia] gene expression analysis as described by Vasmatzis ei al, il998). The 
approach is simple in theory: EST databases are nrst searched for genes that have a 
number of related EST sequences from the target tissue of choice, but none or few 
from non-target tissue libraries. Programmes to assist in the assembly o! such sets of 

■ overlapping data may be developed in-house or obtained pnvaiely or from the 

■ mtemet. For example, the Institute for Genomic Research tTlGR. found at 
hrrp://www.tigr.org) provides many software tools free of charge to the scientific 
communit>'. Included amongst these is the TIGR assembler t Sutton et al. 19Q5j. a 
tool for the assembly of large sets of overlapping data such as ESTs. bacterial 
artificial chromosomes (BAC)s. or small genomes. Candidate EST clones repre- 
senting different genes are then analysed using RN.A blot methods for size and tissue 
specificity and, if required, used as probes to isolate and identify the full length 
cDNA clone for further characterization. In practice however, the method is rather 
more involved, requiring bioinformatic and computer analysis coupled with 
confirmatory molecular studies. Vasmatzis et al. (1998) have described several 
problems in this fledgling approach, such as separating highly homologous 
sequences deriveti from different genes and an overemphasis of spccihcirx- for some 
EST sequences. However, since these problems will largely be addressed by the 
development of more suitable computer algorithms and an increased completeness 
of the EST database, it is likely that this approach to identifying differentially 
expressed genes may enjoy more patronage in the future. 



Problems and poteDtial of differential expression techniques 

The holistic or single cell approach ? 

When working with in vivo models of differential e.xpression, one of the tirsi 
issues to consider must be the presence of multiple cell types in any given specimen. 
For e.xample. a liver sample is likely to contain not only hepaiocytes, but also 
(potentially) Ito cells, bile ductule cells, endothelial cells, various immune ceils (e.g. 
lymphocytes, macrophages and Kupifer cells) and fibroblasts. Other tissues will 
each tzsve their owri distinctive ceil popuiatzons. Also, in the cusc of neoplastic tissue, 
tnere are almost ajways normal. h\-perpiastic and /or oyspiasiic ceils present m a 
sample. One must, therefore, be aware that genes obtained from a differential 
display experiment performed on ah animal tissue model may not necessarily anse 
exclusively from the intended * target* cells, e.g. hepatocytes/ neoplastic cells. If 
appropriate, further analyses using immunohistochemistry, in situ hybridization or 
in situ RT-PCR should be used to confirm which cell types are expressing the 
gene(5) of interest. This problem is probably most acute for those studying the 
diffCl-eUtial expression of genes in thc'develupmenrof different cell types, wher 
there is a need to examine homologous cell populations. The problem is now being 
addressed at the National Cancer InstitBte (Bethesda, MD, USA) where new micro- 
disection techniques have been employed to assist in their gene analysis programme, 
the Cancer Genome Anatomy Project (CGAP.) {Fox more information sec web site: 
hrtp ://www.ncbi. nlm.nih.gov/ncicgap/intro.html). Tliere are also separation tech- 
"hrques available that utilise" cell-specific antigens'as a means to isolate target cells. 
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e.g. fluorescence activated cell sonmg (F.ACS) (Dunbar et al. 1998. Kas-Deelcn 'et 
al. 1998) and magnetic bead technolog>- (Richard et al. 1996. Rogler et al. 1998) 

However, those taking a holistic approach may consider this issue unimponant. 
There is an equally appropriate vieu- that all those genes shou mg altered e.xpression 
wiinm a compromized tissue should be taken mto consideration. .-SLiter all. since all 
tissues are complex maes of different, mteracting cell t%pes ^.h.ch mtimaielv 
regulate each other's gro%.-Th and development, it is clear that each cell r^•pe could in 
some way contribute (positively or negatively) towards the molecular mechanisms 
Which lie behind responses to external stimuli or neoplastic growth. It ,s perhaps 
then more mformative to carr>- out differential display experiments using rr, v.vo as 
opposed to in ritro models, where uniform populations of identical cell's probabiv 
represent a partial, skewed or even inaccurate picture of the molecular changes that 
occur. 



The incidence and possible implications of inter-individual biological variation 
should be considered in any approach where whole animal models are being used It 
IS clear that individuals (humans and animals) respond in different wavs to identical 
stimuli. One of the best characterized examples is the debnsoquine oxidation 
polymorphism, which is mediated by cytochrome CYP2D6 and determines the 
pharmacokinetics of many commonly prescribed drugs (Lennard 1993 Mevcr and 
Zanger 1997). The reasons for such differences are varied and complex, but allelic 
variations, regulatory region polymorphisms and even phvsical and mental health 
can all contribute to obse^^•ed differences in individual responses. Careful thought 
should, therefore, be given to the specific objectives of the studv and to the possible 
value_ of pooling starting material (tissue/mRNA). The efTect of this can be 
benencal through the ironing out of exaggerated responses and unimponant minor 
fluctuations of (mechanistically) irrelevant genes in individual animals, thus 
providing a clearer overall picture of the general molecular mechanisms of the 
response. However, at the same time such minor variations mav be of utmost 
imponance m deciding the abilir>- of individual animals to succumb to or resist the 
effects of a given chemical/disease. 



Hmc efficimt are differ mtial expression technique at recoverxTig a rugh percentage of 
differentially expressed genes ? j 

A number of groups have produced experimental data suggesting that mam- 
rnahaii cells produce between 8000-15000 different mRNA species at anv one time 
(Mechler and Rabbitts 1981. Hedrick et al. 1984. Bravo 1990). although figures as 
high as 20-30000 have also been quoted (.Ajcel et al. 1976). Hedrick et al (1984) 
provided evidence suggesting that the majority of these belong to the rare abundance 
class. A breakdown of this abundance distribution is shown in table 1 
. WWihe results of differennaUi«play-e»perimems have been compared with 
data obtam d previously using other methods, it is apparent that not all differentially 
expressed mRNAs are represented in the final display. In particular, rare messages 
(which, importanlly, often include regulatory proteins) are not easily recovered 
usmg differential display systems. This is a_major shortcoming, as the majority f 
mRNA species exist at levels of less than 0.005 the' to-tSTpSpulation (table 1) 
BertK,!,.-*!- a/. (1995) -examined- the efficiencT^f-f>D templates (heterogen ous 
mRNA populations) for recovering rare messages and were unable to detect mRNA 
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species present at less than 1.2 °o of the total mRNA population — equivalent to ar. . 
intermediate or abundant species. Interestingly, when simpie model systems tsineic 
target only) were used instead of a heterogeneous mRNA population, the same 
pnmers could detect levels of target mRNA down to 10000 x smaller. These results 
are probably best explained by competition for substrates from the many PCR 
producu produced in a DD reaction. 

The numbers of difTcrentially expressed mRNAs reported in the hicrature using 
various model systems provides further evidence that manv difTerentiallv expressed 
mRNAs arc not recovered. For example. DcRisi ft al, (1997^ used DNA arrav 
technology to examine gene expression in yeast following exhaustion of sugar m the 
medium, and found that more than 1700 genes showed a change in expression of at 
least 2-fold. In light of such a finding, it would not be unreasonable to suggest that 
of the 8000-1 5 000 different mRNA species produced by any given mammalian cell, 
up to 1000 or more may show altered expression following chemical stimulation. 
Whilst this may be an extreme figure, it is known that at least 100 genes are 
activated/upregulatcd in Jurkat (T-) cells following IL-2 stimulation (Ullman et al. 
1990). In addition. Wan et al. (1996) estimated that interferon-y-stimulatcd HeLa 
cells diflferentially express up to 433 genes (assuming 24000 distinct mRN.-Vs 
expressed by the cells). However, there have been few publications documenting 
anywhere near the recoverv- of these numbers. For example, in using D D to compare 
normal and regenerating mouse liver, Bauer et al, (1993) found only 70 of 38000 
total bands to be difTcrcnt. Of these, 50 °o (^5 genes) were shown to correspond t 
differentially expressed bands. Chen et aL (1996) reported 10 genes upregulated in 
female rat liver following ethinyl estradiol treatment. McKcnzie and Drake (1997) 
identified 14 different gene products whose expression was altered by phorbol 
myristaic acetate (PMA, a tumour promoter agent) stimulation of a human 
myelomonocytic cell line. Kilty and Vickers (1997) identified 10 different gene 
products whose expression was upregulated in the peripheral blood ieukoc^^es of 
allergic disease sufferers. Linskens et aL (1995) found 23 genes differentially 
expressed between young and senescent fibroblasts. Techniques other than DD 
have also provided an apparent paucirv- of differentially expressed genes. Using SH 
for example, Cao et al. (1997) found 15 genes differentially expressed in colorectal 
cancer compared to normal mucosal epithelium. Fitzpatrick et al. ( 1 995) isolated 1 7 
genes upregulated in rat liver following treatment with the peroxisome proliferat r. 
clonbratc; Phibps et aL (19901 isolated 12 cDN.A clones which were upregulated m 
highly metastatic mammarv* adenocarcinoma cell lines compared to poorly meta- 
static ones. Prashar and Wcissman (1996) used 3' restriction fragment analysis and 
identified approximately 40 genes showing altered expressiori within 4 h of 
activation of Jurkat T-cells. Groenmk and Leegwater (1996) analysed 27 gene 
fragments isolated using SSH of delayed early response phase of liver regeneration 
and found only 12 to be upregulated. 

In the laboratory, SSH was used to isolate up to 70 candidate genes which appear 
to show altered expression in gum a pig liver following shon-term treatment with 
the peroxisome proliferat r, WT- 14,643 (Rocken. Swales, Esdaile and Gibson, 
unpublished observations). However, these findings have still to be confirm d by 
analysis of the extracted tissue mRNA f r differential expression of these sequences. 
" Whilst the latest differential display technologies' sr« purported to include design 
. _ and exii^rimcntal modifications tp overcome ibi4_ia£k oLefficiency (in both the total 
number of differentially expressed genes recovered and the percentage that are true 
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experiments and animals. DD^ on the other hand, is not subicct to :his "crex 
zone since, unlike SH approaches, it does not amplify the dinerence m expression 
between r^^-o samples. Wan et al. (1996) reponed that dinerenccs m expression of 
twofold or more arc detectable using DD. 



Resolution and visualization of differential expression products 

it seems highly improbable with current technology that a gel svstem could be 
developed that is able to resolve all gene species showing altered expression m anv 
given test system (be it SH- or DD-based). Poiyacn-lamide eel electrophoresis 
(PAGE) can resolve size differences down to 0.2 iSambrook ei al. 1939) and jre 
used as standard in DD experiments. Even so, it is clear that a complex series of gene 
products such as those seen in a DD will contain unresoivable components. Thus, 
what appears to be one band in a gel may in fact rurn out to be several. Indeed, it has 
been well documented (Mathieu-Daude et aL 1996. Smith et al. 1997) that a single 
band extracted from a DD often represents a composite of heieroeeneous products, 
andjhe same has been found for SSH displays in this laboratory (Rockett et aL 
1997). One possible solution was offered by Mathieu-Daude et al. (1996), who 
extracted and reamplined candidate bands from a DD displav and used single strand 
conformation polymorphism (SSCP) analysis to confirm which components 
represented the truly differentially expressed product. 

Many scientists often iry to avoid the use of PAGE where possible because it is 
technically more demanding than agarose gel electrophoresis (AGE). Unfortunatelv, 
high resolution agarose gels such as Metaphor (FMC, Lichhcld. UK) and AquaP r 
HR (National Diagnostics. Hessle. UK), whilst easier to prepare and manipulai 
than PAGE, can only separate DNA sequences which differ in size by around 
1.5-2^0 (15-20 base pairs for a 1Kb fragment). Thus. SSH, RDA or other such 
products which differ in size by less than this amount are normally not resolvable. 
However, a simple technique does in fact exist for increasing the resolving power of 
AGE— the inclusion of HA-red (lO-phenyl neutral rcd-PEG ligand) or HA-vellow 
(bisbenzamide-PEG ligand) (Hanse Anahtik GmbH. Bremen. Germany) in a 
gel separates identical or closely sized products on base content. Specificallv. 
HA-red and -yellow selectively bind to GC and AT DNA motifs, respectively 
•AVawer ei al. 1995. Hanse .\naiytik 1997. personal commuiucation*. Smcc both 
HA-stams possess an overall positive cnarge. they migrate towards tne catnooc 
when an electric field is applied. This is m direct opposition to DNA. which 
IS negatively charged and. therefore, migrates towards the anode. Thus, if two 
DNA clones arc identical m sue (as perceived on a standard high resolution 
agarose gel), but differ m AT/GC content, inclusion of a HA-dye in the gel 
will effectively retard the migration of one of the sequences compared to the 
other, effectively making it apparently larger and. thus, providing a means of 
differentiating between the two. The use of HA-red has been shown to resolve 
sequences with an AT vanaiion of less than 1 % (Wawer et a/. 1995). whilst Hanse 
.Analyxik have reponed that HA staining is so sensitive that in one case it was us d 
to distinguish two 567bp sequences which-differed by only a single point mutation 
(Hanse Analy-tik 1996. personal communication). Therefore, if one wishes to check 
whether all the clones produced from a specific band in a differential display 
-expenment-«re derived from thr»me g ene speci e s, a small-amount of reamplifi d 
or digested clone can be run on a siandar_d high resolution gel, and a second aliquot 
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F.gvjre 10. D,«nm.nat.on of clone, of ident.c.l/ nearly .dent.cal „ze u.mg HA-red Band, of decre«m» 
expenment and cloned^seven colonies were picked a. random trorr, each cloned band and their 

fh cTonl. lo^' Tk r - ' """'"'"8 ■ -nl HA-red. \V„h few exception. ^ 
»'| Bl wh /h ^"^ '"^"m '° '"4 " 'he pre,ence of HA-rrd 

s:v;r:j-^::i^;:— ^^^^^^ 

m a similar gel containing one of the HA-stams. The standard eel should indicate 
any gross s.ze differences, whilst the HA-sta.ned geK should "separate orheov.se 
";:r? ^P"'" "^"'^/^^^ -^^E) accordmg ro the.r base content. Geisingcr 

et al. (199/) reported successful use of th.s approach for .dent.fvmg DD-denved 
clones. Figure 10 shows such an experiment earned out ,n this laborator^- on clones 
obtained from a band extracted from an 5SH display. 

An altemanve approach is to carr.- out a Z-D analysis of the differential displav 
proQucts. n this approach, size-based separation is rirst earned out m a standard 
agarose gel. The gel slice containing the display is then e.xtracted and incorporated 
>n to a HA gel for resolution based on AT/GC content. 

Of course, one should always consider the possibilir>- of there being different 
gene species which are the same size and have the same GC/.-\T content However 
even these species are not unresolvable given some effon-again. one might use 
bbCP. or perhaps a denaturing gradient gel electrophoresis (DGGE) or temperature 
gradient field electrophoresis (TGGE) approach to resolve the contents of a band 
either dir ctly on the extracted band (Suzuki- « al. 1991) or on the reamplified 
product. ^ 

The requirement of some differential display techniques .0 visualize large 
numbers f products (e.g. DD and GEF)can also present a problem in that, in terms 
of numbers, the resolution of PAGE rarely exceids BOO-tOO bands. One approach to 
77^o«T^ might be touse-2--D^, suchTnrth^se-descnbed bv Uinerlinden 
a/. (1989) and Hatada <to/. (1991), - . 
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Extraction of differentially expressed bands from a gel can be compie.x since. :r\ 
some cases (e.g. DD. GEF), the results are visualized by autoradiographic means. • 
such that precise overlay of. the developed film on the gel must occur if the correct 
band is to be extracted for further analysis. Clearly, a misjudged extraction can 
account for many man-hours lost. This problem, and that or the use of radioisotopes, 
has been addressed by several groups. For example. -Lohmann et qL il995> 
demonstrated that silver staining can be used directly to visualize DD bands in 
honzontal PAGs. An et al. (1996) avoided the use of radioisotopes by transferring a 
small amount (20-30 °o) of the DNA from their DD to a nyion membrane, and 
visualizing the bands using chemiluminescent staining before going back to extract 
the remaining DNA from the gel. Chen and Peck (1996) went one step further and 
transferred the entire DD to a nylon membrane. The DNA bands were then 
visualized using a digoxigenin (DIG) system (DIG was attached to the polydT 
primers used in the differential display procedure). Differentially expressed bands 
were cut from the membrane and the DNA cluted by washing wuh PGR buffer prior 
to reamplification. 

One of the advantages of using techniques such as SSH and RDA is that the final 
displav can be run on an agarose gel and the bands visualized with simple ethidium 
bromide staining. Whilst this approach can provide acceptable results, overstainmg 
with SYBR Green I or SYBR Gold nucleic acid stams (FMC) effectively enhances 
the intensm- and sharpness of the bands. This greatly aids m their precise extraction 
and often reveals some faint products that may otherwise be overlooked. Whilst 
differential displays stained with SYBR Green I arc better visualized using sh n 
wavelength UV (254 nm) rather than medium wavelength (306 nm). the shorter 
wavelength is much more DNA damaging. In practice, it takes only a few seconds 
to damage DNA extracted under 254 nm irradiation, effectively preventing 
reamplification and cloning. The best approach is to ovcrstain with SYBR Green I 
and extract bands under a medium wavelength UV transillumination. 



The possible use of 'microfingcrpriniing* to reduce complexity 

Given the sheer number of gene products and the possible complcxirv* of each 
band, an alternative approach to rapid characterization may be to use an enhanced 
analysis of a small section of a differennal display— a *sub-nngerpnm * or * micr - 
nngcrpnnt*. In this case, one couid concentrate on those bancs wnich oniy appear 
in a particular chosen size region. Reaucmg the fingerprint in :nis way nas at least 
nvo advantages. One is that it should be possible to use different gel t>-pcs, 
concentrations and run times tailored exactly to that region. Currently, one might 
run products from 1 00-3000 ^ bp on the same gel. which leads to compromize in ih 
gel system being used and consequently to suboptimal resolution, both in terms of 
size and numbers, and can lead to problems in the accurate excision of individual' 
bands. Secondly, it may be possible to enhance resolution by using a 2-D analysis 
using a HA-stain, as described earlier. In summar>\ if a range of gene product sizes 
is carefully chosen to included cenain ' relevant ' genes, the 2-D system standardized, 
and appropriate gene analysis used, it may be possible to develop a method for the 
early and rapid identification of compounds which have similar or widely different 
" cellular effects. If the prognosis for exposure to one or more other chemicals which 
display a similar_^prpfile is already . kn own , then one cou ld perhaps predict similar 
effects for any new compounds which show a s imila r micro-fingerprmt. 
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.\n alternative approach to micronngerprmtmg is to examine altered expression 
m specific families of genes through careful selection of PCR primers and. or post- 
reaction analysis. Stress genes, growth factors and/or the.r receptors, cell cvcline 
genes. c>-tochromes P450 and regulator>- proteins might be considered as candidates 
^.l^'ovV P " °ff-the-shelf DNA arrays ,e.g. Clontech s 

.\tlas cDNA Expression Array series) already anticipated this to some degree bv 
grouping together genes involved in different responses e.g. apoptos.s. stress DX A. 
damage response etc. 



Screening 

False positives 

°^ discussed at length amongst the 

ffrc '^'*P*»>"'"'""ni^-(L'>ng"a/. 1993. 1995. Nishio^ra/. 1994 Sun era/ 
1994. Sompayrac er al. 1995). The reason for false positives vanes with the 
been^HPI r""^"!'- T"""' ^^A. the use of adaptors which have not 
been HPLC Punfied can lead to the production of false positives through illegitimate 
hgation events (O'Ne.lI and Sinclair 1997). whilst in DD thev can anse fhrough 
r K T I T '"'^^'"'""^ transcription of rRNA. In SH. false positives appear 

rDNArpvi"' ' ^p'""- ^'^^''"8^ -^>- 

cU.NA/mR.NA species which do not undergo hybridization for technical reasons 

A quick screening of putative difTerentially expressed clones can be carried out 
usmg a simple dot blot approach, in which labelled f^rst strand probes svnthesized 

''>-l'"'^i«d -rray of said clones (Hedrick ct 
al. 1984. Sakaguch, et al. 1986). DifTerentially expressed clones will hvbridize to 
tester probe, but not driver. The disadvantage of this approach is that rare species 
may not generate detectable hybridization signals. One option for those using SSH 
IS to screen the clones usmg a labelled probe generated from the subtracted cDNA 
from which ,t was derived, and with a probe made from the reverse subtraction 
reaction (ClonTechniques 1997a). Since the SSH method enriches rare sequences. 
It should be possible to confirm the presence of clones representing low abundance 
genes. p«P'« this quick screening step, there is still the need to eo back to the 
ongmal tnRNA and confirm the altered expression usine a mor'e quanmarive 
approach. Although this may be achieved usmg Noahem blots, the sensmvtrv is 
poor by today s high standards and one must rely on PCR methods for accurate and 
sensitive determinations (see below). 



Sequence analysis 

The tn^Uoriry of difTerential display procedures produce final products which are 
between 100 and lOOObp in size. However, this may considerably reduce the size of 
the sequence for analysis of the DNA databases. This in turn leads to a reduced 
confidence m the result-severaJ families of genes have members whose DNA 
-fences art- alinosT .d en nc ai u xLcpi iu j f e u key stretches, e.g. the cytochrome 
P430 gene superfamily (Nelson ft olA^^f,). Thus, does the clone identified as being 
almost Identical to gene X, really come from that gene, or its brother gene X, or it^ 
as yet undiscovered sister X,? FoTexample. irsing SSH; plrt of a gene was isolated 



Dijjermiial gene expression 



nc altered expression 
primers and/or post- 
eccptors. cell cycling 
sidered as candidates 
rays (e.g. Clontcch's 
is to some degree by 
jpiosis. stress. DNA- 



lengrh amongst the 
etal. 1994. Sun e:ai. 
ves varies uith the 
•tors which have not 
. through illegitimate 
ry can arise through 
alse positives appear 
omc may arise from 
>r technical reasons, 
•s can be carried out 
: probes synthesized 
d clones (Hedrick et 
les will hybridize to 
h is that rare species 
for those using SSH 
le subtracted cDNA 

reverse subtraction 
ches rare sequences, 
uing low abundance 
rd to eo back to the 
1 more quanritanve 
us. :hc sensmvirt- is 
ods for accurate and 



1 products which arc 
ly reduce the size f 
1 leads to a reduced 
Tibers whose DNA 
J.g. the cv-cochromc- 
e identified as being 
•other gene Xj or iu 
a gene was isolated. 



which was up-regulatcd in the liver of rats exposed to Wv. 1 4.643 and was ident'.ntrd ' 
by a FASTA search as being transferrin (data not showr^). However, transicrr-.n is 
known to be downrcgulated by hypolipidemic peroxisome proiiierators such as Wv- 
14.643 (Hcrrz et at. 1996), and this was confirmed with subsequent RT-PCR 
analysis. This suggests that the gene sequence isolated may beione to a gene which 
IS closely related to iransfcrnn. but is regulated by a dineren: mechanism. 

A further problem associated with 5H technology is redundancy. In most cases 
before SH is earned out. the cDNA population must nrst be simplihed by restriction 
digestion. This is important for.at least two reasons: 

(1) To reduce complexity— long cDNA fragments may form complex networks 
which prevent the formation of appropriate hybrids, especially at the high 
conccniraiions required for efficient hybridization. 

(2) Cunmg the cDN.As into small fragments provides better representation of 
mdividuai genes. This is because genes derived from related but distinct 
members of gene families often have similar coding sequences that may cross- 
hybridize and be eliminated during the subtraction procedure (Ko 1990). 
Furthermore, different fragments from the same cDNA may difTer considerably 
in terms of hybridization and amplincatjon and. thus, may not efficicntlv do one 
or the other (Wang and Brown 1991). Thus, some fragments from difTerentialiy 
expressed cDNAs may be eliminated during subiracrive hybridization pro- 
cedures. However, other fragments may be enriched and isolated. As a 
consequence of this, some genes will be cut one or more times, giving rise to rwo 
or more fragments of different sizes. If those same genes are differentially 
expressed, then rwo or more of the different size fragments may come through 
as separate bands on the hnal differential display, increasing the observed 
redundancy and mcreasing the number of redundant sequencing reactions. 

Sequence comparisons also throw up another important point— at what degree 
of sequence similarity does one accept a result. Is 90 °o idcntiiiy benvccn a gene 
derived from your model species and another acceptably closer Is 95 °o between 
your sequence and one from the same species also acceptable' This problem is 
particularly relevant when the forward and reverse sequence comparisons giv 
similar sequences with completely different gene species! An arbitrary- decisi n 
seems to be to allocate genes that arc oeiinire (95 and above sirmiani^-) and then 
group those between oO and 95 as being related or possible homoloeues. 

Quantitative analysis 

At some point, one must give consideration to the quantitative analysis of ih 
candidate genes, either as a means of confirming that they are truly differentially 
expressed, or in order to establish just what the differences arc. Nonhem blot 
analysis is a popular appr ach as it is relatively easy and quick to perform. How v r, 
th major drawback with N nhem blots is that they arc often not sensitive nough 
to d tect rare sequences. Since the majority of messages expressed in a cell ar of low 
abundance (see table 1 ). this is a major problem. Consequently, RT-PCR may be the 
method of choice- for confirming- diffe^enn d^cx^Jl e ^siun. Although the procedure is 
somewhat more complex than Nonhem analysis, requiring synthesis of primers and 
optimization f reaction c ndirioris for each gene species, it is now possible to set up 
high throughput PGR systems'using mulitchahncl pipettes. 96 + -well plates and 
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appropriate thermal cycling technology. Whilst quantitative analysis is more 
desirable, being more accurate and without reliance on an internal standard, the 
money and time needed to develop a competitor molecule is often excessive, 
especially when one might be examimng tens or even hundreds of gene species. The 
use of semi-quautitative analysis is simpler, although still rclativeU involved. One 
must first of all choose an internal standard that does not chanee m the test cells 
compared to the controls. Numerous reference genes have been tried m the past for 
example mterferon-gamma (IFN-r. Five et al. 1989). ^-actm (Heuval et al. 1994) 
g!yceraldehyde-3.phosphate dehydrogenase (G.\PDH. Wong et at. 1904) di- 
hydrofolaie reductase (DHFR. Mohler and Butler 1991). /J-Z-microelobuim t^.;. 
m, Murphy et al. 1990). hypoxanthine phosphoribosyl transferase i HPRT. Foss et 
al. 1998) and a number of others (ClonTechniques 1997b). Ideallv. an internal 
standard should not change its level of expression m the cell regardless of cell age 
stage in the cell cycle or through the effects of external stimuli. However, it has been 
shown on numerous occasions that the levels of most housekeeping genes currentlv 
used by the research community do in fact change under certain conditions and in 
difTerent tissues (ClonTechniques 1997b). It is imperative, therefore, that pre- 
liminary experiments be earned out on a panel of housekeeping genes to establish 
their suitabilm- for use in the model system. 

Interpretation of quantitative data must also be treated with caution Bv 
comparmg the lists of genes identified by differential expression one can perhaps 
gam insight into why two different species react in difTerent ways to external stimuli. 
For example, rats and mice appear sensitive to the non-gcnotoxic effects of a wide 
range of peroxisome proliferators whilst Syrian hamsters and guinea pigs are largely 
resistant (Onon et al. 1984. Rodricks and Tumbull 1987. Lake et al 1989 1993 
.MaJcowska et al. 1992). A simplified approach to resolving the reason(s) why i, to 
compare lists of up- and down-regulated genes in order to identifv those which are 
expressed m only one species and. through background knowledge of the effects of 
the said gene, might suggest a mechanism of facilitated non-genotoxic carcinogenesis 
or protection. Of course, the situation is likely to be far more complex. Perhaps if 
there were one key gene protecting guinea pig from non-genotoxic effects and it was 
upregulated 50 times by PPs. the same gene might onlv be up-regulated five limes 
m the rat. However, since both were noted to be upregulated. the imoorxance of the 
gene may be overlooked. Just to complicate maners. a laree c.^ange m expression 
does not necessarily mean a biologically important change. For example, what is the 
tnie relevance of gene Y which shows a 50- fold increase after a particular treatment 
and gene Z which shows only a 5-fold increase? If one examines the literature one 
may find that histoncally. gene Y has often been shown to be up-regulated 4O-«0. 
fold by a number of unrelated stimuli— in ligh't'JT this the 50- fold increase would 
appear less significant. However, the literature may show that gene Z has never been 
recorded as having more than doubled in expression— which makes your 5-fold 
increase all the more exciting. Perhaps even more interesting is if that same S-foId 
increase has only been seen in related neopraimsdr following treatment with related 
chemicals. 



Problems m usm'if the diflerenriaT'display appr ach 

Differential display technology originally held promise of an easily obtainable 
'fingerprint' of those genes which are up- or down- regulated in test animals/cells in 
a developmental process or following exposure to given stimuli. However, it has 
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become clear that the fingerprinting process, whilst still valid, is much too complex 
to be represented by a single technique profile. This is because all dmerential display 
techniques have common and/or unique technical problems which preclude the 
isolation and identification of all those genes which show changes in expression, 
Furthemnore, there are important genetic changes related to disease development 
which differential expression analysis is simply not designed to address. An example 
of this is the presence of small deletions, insenions. or point mutations such as :hose 
seen m activated oncogenes, tumour suppressor genes and individual polv- 
morphisms. Polymorphic variations, small though they usually are, are otten 
regarded as being of paramount importance in explaining why some patients 
respond bcner than others to certain drug treatments (and. m ioeicai extension, whv 
some people are less affected by potentially dangerous xenobiotics f carcinogens than 
others). The identification of such point mutations and narurallv occurring 
polymorphisms requires the subsequent application of sequencing. SSCP, DGGE 
or TGGE to the gene of interest. Furthermore, differential display is not designed 
to address issues such as alternatively spliced gene species or whether an' increased 
abundance of mRNA is a result of increased transcription or increased mRNA 
stabilitv. 
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Conclusions 

Perhaps the main advantage of open system differential display techniques is that 
they are not limited by extant theories or researcher bias in revealing genes which are 
differentially expressed, since they are designed to amplify all genes which 
demonstrate altered expression. This means that they are useful for the isolation of 
previously unknown genes which may turn out be useful biomarkers of a particular 
state or condition. At least one open system (SAGE) is also quantitative, thus 
eliminating the need to return to the original mRNA and carr>- out Northern/ PGR 
analysis to confirm the result. However, the rapid progress of genome mapping 
projects means that over the next 5-10 years or so. the balance of experimental use 
will switch from open to closed differential display systems, particularly DNA 
arrays. Arrays are easier and faster to prepare and use. provide quantitative data, arc 
suitable for high throughput analysis and can be tailored to look at specific signaiiing 
pathways or families of genes. Idennncation of aJl the gene sequences in human and 
common laboratory- animals combined with improved DNA array technology, 
means that it will soon no longer be necessary to try to isolate differentially expressed 
genes using the technically more demanding open system approach. Thus, their 
. jiuin advantage (that of identifying unknown genes) will be largely eradicated. It is 
likely, therefore, that their sphere of application will be reduced to analysis of the 
less common laboratory species, since it will be some time yet bcf re the genomes of 
such animals as zcbrafish, elec?tric eels, gerbilis, crayfish and squid, for example, will 
be sequenced. 

Of course, in the end the question will always remain: What is the functional/ 
biological significance of the identified, differentially expressed genes? One 
persistent problem is understanding whether differentially expressed genes arc a 
cause or consequence of the altered state. Furthermore, many chemicals, such as 
non-gcnot xic carcinogens, are also mitogens and so genes associated with 
rcplicati n will also be upregulated but may have little or nothing to do with the 
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car nogcn.c effect^ Wh.lsr difTcrcnnal display technology- cannot hop. to answer 
thes questions, .r docs prov.dc a spnngboard from wh.ch .dcnt.ncat.on. rejaton 
and funcnonaJ stud.cs can be launched. Understandmg the molecular mechf^Lm of 
c 1 ular responses .s almost .mposs.ble w.thout knowmg the regulation and funcnon 

time L « u"" " Pho^graph. sho,v,ng details of a n.xed moment in 

nrne Con laer tne ri.stor.an who knows the outcome of a banle and the placemen" 
and condition of the troops before the battle commenced, but is asked 'o tn and 
deduce how the banle progressed and why u ended as it did from a f w stui 
Photographs-an impossible task. In order to understand the batt The H s^or an 

officers, what the orders were and whether they were obeved. He must examine he 
ZT- "'""T* l**"'* consider the efTect, the prev 1 "g Je" her 

conditions exerted Likewise, if mechanistic answers are to be fonhcom.nrthe 
scientist must use differential display .„ combination with other techn.queT s!;h a! 

time and dose response analyses. Although th.s rev,ew has emphasised the 
.mportance of differential gene profiling, it should not be considered mt lanon and 

he full irnpact of this approach w.ll be strengthened if used in cor^b^^n" tirw^J 
functional genomics and proteom.cs (2-dimensional protein Jsl^^^olc^. 

ocusing and subsequent SDS electrophoresis and virtual 2D-m p usTng cap^^^^^^^^ 
electrophoresis). Proteomics is anracting much recent anention as Zy of 

hanges resulting :n differential gene e.xpress.on do not involve change" m mRXA 
levels, as decnbed extensively heretn, but rather protein-protein prote V A ;nd 
protein phosphorylation events which would require funcuonal lenomi^s 
proteomic technologies for investigation. tunct.onal genomics or 

Despite the limitations of difTerential displav technolog>- it is clear that manv 

t^^T^ll A °' ""^-"tand^g such molecular chan.es are almost Z- 

ev^rec^c'tT'^' ringerpnnts could indicate the fam^lv^r 

evea specific type of chemical an individual has been exposed to plus the length 
^d/or acuteness of that exposure, thus mdicating the most prudent treamTem 
They may also help uncover differences in histologicallv identical cancers o^v^i 
diagnostic tests for the earli^t stages of neoplasia'and^ a^aTperhaps J^diL^^^^^^ 
most efficacious treatment. ~ _ . ^'i8.«^..pcrnaps inaicaie the 

DN?se^u"T completed early in the next cennary and the 

Sd evX ^of d ff "T "^^^ dev7lopmen 

and evolut, n of differential gene expression technology will ensure that this 

knowledge contributes fully to the understanding of humSi disease process". 
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ABSTRACT The recent ability to sequence whole genomes 
allows ready access to all genetic material. The approaches 
outlined here allow automated analysis of sequence for the 
synthesis of optimal primers in an automated multiplex 
oligonucleotide synthesizer (AMOS). The efTiciency is such 
that all ORFs for an organism can be amplified by PCR. The 
resulting amplicons can be used directly in the construction of 
DNA arrays or can be cloned for a large variety of functional 
analyses. These tools allow a replacement of single-gene 
analysis with a highly efTicient whole-genome analysis. 



The genome sequencing projects have generated and will 
continue to generate enormous amounts of sequence data. The 
genomes of Saccharomyces cerevisiae, Escherichia coli, Hae- 
mophilus influenzae (1 ). Mycoplasma genitalium (2). and Meth- 
anococcus jannaschii (3) have been completely sequenced. 
Other model organisms have had substantial ponions of their 
genomes sequenced as well, including the nematode Caeno- 
rhabditis elegans (4) and the small flowering plant Arabidopsis 
thaliana (5). This massive and increasing amount of sequence 
information allows the development of novel experimental 
approaches to identify gene function. 

One standard use of genome sequence data is to attempt to 
identify the functions of predicted open reading frames 
(ORFs) within the genome by comparison to genes of known 
function. Such a comparative analysis of all ORFs to existing 
sequence data is fast, simple, and requires no experimentation 
and is therefore a reasonable first step. While fmding sequence 
homologies/motifs is not a substitute for experimentation, 
noting the presence of sequence homology and/or sequence 
motifs can be a useful first step in fmding interesting genes, in 
designing experiments and, in some cases, predicting function. 
However, this type of analysis is frequently uninformative. For 
example, over one-half of new ORFs in 5. cerevisiae have no 
known function (6). If this is the case in a well studied organism 
such as yeast, the problem will be even worse in organisms that 
are less well studied or less manipulable. A large, experimen- 
tally determined gene function database would make homol- 
ogy/motif searches much more useful. 

Experimental analysis must be performed to thoroughly 
understand the biological function of a gene product. Scaling 
up from classical "cottage industry" one-gene-oriented ap- 
proaches to whole-genome analysis would be very expensive 
and laborious. It is clear that novel strategies are necessary to 
efficiently pursue (he next phase of the genome projects — 
whole-genome experimental analysis to explore gene expres- 
sion, gene product function, and other genome functions. 
Model organisms, such as 5. cerevisiae, will be extremely 
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important in the development of novel whole-genome analysis 
techniques and, subsequently, in improving our understanding 
of other more complex and less manipulable organisms. 

The genome sequence can be systematically used as a tool 
to understand ORFs, gene product function, and other ge- 
nome regions. Toward this end, a directed strategy has been 
developed for exploiting sequence information as a means of 
providing information about biological function (Fig. 1). Ef- 
forts have been directed toward the amplification of each 
predicted ORF or any other region of the genome ranging 
from a few base pairs to several kilobase pairs. There are many 
uses for these amplicons — they can be cloned into standard 
vectors or specialized expression vectors, or can be cloned into 
other specialized vectors such as those used for two-hybrid 
analysis. The amplicons can also be used directly by. for 
example, arraying onto glass for expression analysis, for DNA 
binding assays, or for any direct DNA assay (7). As a pilot 
study, synthetic primers were made on the 96-well automated 
multiplex oligonucleotide synthesizer (AMOS) instrument (8) 
(Fig. 2). These oligonucleotides were used to amplify each 
ORF on yeast chromosome V. The current version of this 
instrument can synthesize three plates of 96 oligonucleotides 
each (25 bases) in an 8-hr day. The amplification of the entire 
set of PCR products was then analyzed by gel electrophoresis 
(Fig, 3). Successful amplification of the proper length product 
on the first attempt was 95%. This project demonstrates that 
one can go directly from sequence information to biological 
analysis in a truly automated, totally directed manner. 

These amplicons can be incorporated directly in arrays or 
the amplicons can be cloned. If the amplicons are to be Cloned, 
novel sequences can be incorporated at the 5' end of the 
oligonucleotide to facilitate cloning. One potential problem 
with cloning PCR products is that the cloned amplicons may 
contain sequence alterations that diminish their utility. One 
option would be to resequence each individual amplicon. 
However, this is expensive, inefficient, and lime consuming. A 
faster, rnore cost-effective, and more accurate approach is to 
apply comparative sequencing by denaturing HPLC (9). This 
method is capable of detecting a single base change in a 2-kb 
heteroduplex. Longer amplicons can be analyzed by use of 
appropriate restriction fragments. If any change is detected in 
a clone, an alternate clone of the same region can be analyzed. 
Modifying the system to allow high throughput analysis by 
denaturing HPLC is also relatively simple and straightforward. 

If amplicons are used directly on arrays without cloning, it 
is important to note that, even if single PCR product bands are 
observed on gels, the PCR products will be contaminated with 
various amounts of other sequences. This contamination has 
the potential to affect the results in, for example, expression 
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Fig. 1. Overview of sysicmaiic mcihod for isolating individual 
genes. Sequence information is obtained automatically from sequence 
databases. The data are input into primer selection software specifi- 
cally designed to target ORFs as designated by database annotations. 
The output file containing the primer information is directly read by 
a high-throughput oligonucleotide synthesizer, which makes Ihe oli- 
gonucleotides in 96-welI plates (AMOS, automated multiplex oligo- 
nucleotide synthesizer). The forward and reverse primers are synthe- 
sized in the same location on separate plates to facilitate the down- 
stream handling of primers. The amplicons are generated by PCR in 
96-well plates as well. 

analysis. On the other hand, direct use of the amplicons is 
much less labor intensive and greatly decreases the occurrence 
of mistakes in clone identification, a ubiquitous problem 
associated with large clone set archiving and retrieving. 

Any large-scale effort to capture each ORF within a genome 
must rely on automation if cost is to be minimized while 
efficiency is maximized. Toward that end, primers targeting 
ORFs were designed automatically using simple new scripts 
and existing primer selection software. These script-selected 
primer sequences were directly read by the high-throughput 
synthesizer and the forward and reverse primers were synthe- 
sized in separate plates in corresponding wells to facilitate 
automated pipetting and PCR amplifications. Each of the 
resulting PCR products, generated with minimum labor, con- 
tains a known, unique ORF. 

Large-scale genome analysis projects are dependent on 
newly emerging technologies to make the studies practical and 
economically feasible. For example, the cost of the primers, a 
significant issue in the past, has been reduced dramatically to 
make feasible this and other projects that require tens of 
thousands of oligonucleotides. Other methods of high- 
throughput analysis are also vital to the success of functional 
analysis projects, such as microarraying and oligonucleotide 
chip methods (10-14). 

Changes in altitude are also required. One of the major costs 
of commercial oligonucleotides is extensive quality control 
such that virtually 100% of the supplied oligonucleotides are 
successfully synthesized and work for their intended purpose. 
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Fig. 2. Overall approach for using database of a genome to direct 
biological analysis. The synthesis of the 6,000 ORFs (orfs) for each 
gene of 5. cerciisiae can be used in many applications utilizing both 
cloning and microarraying technology. 

Considerable cost reduction can be obtained by simply de- 
creasing the expected successful synthesis rate to 95-97%. One 
can then achieve faster and cheaper whole genome coverage by 
simply adding a single quality control .at the end of the 
experiment and batching the failures for resynthesis. 

The directed nature of the amplicon approach is of clear 
advantage. The sequence of each ORF is analyzed automati- 
cally, and unique specific primers are made to target each 
ORF. Thus, there is relatively little time or labor involved — for 
example, no random cloning and subsequent screening is 
required because each product is known. In the test system, 
primers for 240 ORFs from chromosome V were systematically 
synthesized, beginning from the left arm and continuing 
through to the right arm. At no point was there any manual 
analysis of sequence information to generate the collection. In 
many ways, now that the sequence is known, there is no need 
for the researcher to examine it. 

These amplicons can be arrayed and expression analysis can 
be done on all arrayed ORFs with a single hybridization (10). 
Those ORFs that display significant differential expression 
patterns under a given selection are easily identified without 
the laborious task of searching for and then sequencing a clone. 
Once scaled up, the procedure provides even greater returns 
on effort, because a single hybridization will ultimately provide 
a "snapshot" of the expression of all genes in the yeast genome. 
Thus, the limiting factor in whole genome analysis will not be 
the analysis process itself, but will instead be the ability of 
researchers to design and carry out experimental selections. 

Current expression and genetic analysis technologies are 
geared toward the analysis of single genes and are ill suited to 
analyze numerous genes under many conditions. Additional 
difficulties with current technologies include: the effort and 
expense required to analyze expression and make mutants, the 
potential duplication of effort if done by different laboratories, 
and the possibility of conflicting results obtained from differ- 
ent laboratories. In contrast, whole genome analysis not only 
is more efficient, it also provides data of much higher quality; 
all genes are assayed and compared in parallel under exactly 
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the same conditions. In addition, amplicons have many appli- 
cations beyond gene expression. For example, one recent 
approach is to incorporate a unique DNA sequence lag, 
synthesized as pan of each gene specific primer, during 
amplification. The tags or molecular bar codes, when reintro- 
duced into the organism as a gene deletion or as a gene clone, 
can be used much more efficiently than individual mutations 
or clones because pools of tagged mutants or transformanis 
can be analyzed in parallel. This parallel analysis is possible 
because the tags are readily and quantitatively amplified even 
in complex mixtures of tags (13). 

These ORF genome arrays and oligonucleotide lagged 
libraries can be used for many applicaiions. Any conventional 
selection applied to a library :hai gives discrete or multiple 
products can use these technoiogies for a simple direct read- 
out. These include screens and selections for mutant comple- 
mentation, overexpression suppression (15. 16). second-site 
suppressors, synthetic lethality, drug target overexpression 
(17), two-hybrid screens (18), genome mismatch scanning (19), 
or recombination mapping. 

The genome projects have provided researchers with a vast 
amount of information. These data must be used efficiently 
and systematically to gain a truly comprehensive understand- 
ing of gene function and, more broadly, of the entire genome 
which can then be applied to other organisms. Such global 
approaches are essential if we are to gain an understanding of 
the living cell. This understanding should come from the 
viewpoint of the integration of complex regulatory networks, 
the individual roles and interactions of thousands of functional 
gene products, and the effect of environmental changes on 
both gene regulatory networks and the roles of all gene 
products. The lime has come to switch from ihe analysis of a 
single gene to the analysis of the whole genome. 
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The availability of genome-scale DNA sequence information and reagents has radically altered life-science 
research. This revolution has led to the development of a new scientific subdiscipline derived from a combina- 
tion of the fields of toxicology and genomics. This subdiscipline, termed toxicogenomics, is concerned with the 
Identification of potential human and environmental toxicants, and their putative mechanisms of action through 
the use of genomics resources. One such resource is DNA microarrays or "chips/' which allow the moni'torinq of 
the expression levels of thousands of genes simultaneously. Here we propose a general method by which gene 
expression, as measured by cDNA microarrays, can be used as a highly sensitive and informative marker for 
toxicity. Our purpose is to acquaint the reader with the development and current state of microarray technol- 
ogy and to present our view of the usefulness of microarrays to the field of toxicology Mol Carcinoa 24 153- 
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INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over .2.2 billion bases [1], are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2]. The 
first complete sequence of a free-living organism, 
Haemophilus influenzae, was reported in 1995 [3] and 
was followed shortly thereafter by the first complete 
sequence of a eukaryote, Saccharomyces cervisiae [4]. 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion of the Homo sapiens DNA sequence is not far 
behind [5]. 

To exploit more fully the wealth of new sequence 
information, it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
northern blotting, RNAse protection assays, SI nu- 
clease analysis, plaque hybridization, and slot blots 
do not provide sufficient throughput to effectively 
utilize the new genomics resources. Newer methods 
such as differential display [6], high-density filter 
hybridization [7,8], serial analysis of gene expression 
[9], and cDNA- and oligonucleotide-based miaoarray 
"chip" hybridization [10-12] are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicity, as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the characterisric and spe- 
cific pattern of gene expression elicited by a given 
toxicant. Microarray technology offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicology testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 
cDNA Microarrays 

in the past several years, numerous systems were 
developed for the construction of large-scale DNA 
arrays. All of these platforms are based on cDNAs 
or oligonucleotides immobilized to a solid sup- 
port. In the cDNA approach, cDNA (or genomic) 
clones of interest are arrayed in a multi-well for- 
mat and amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 
support by using high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotting onto a glass substrate 
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(13,14]. Sample detection for microarrays on glass 
involves the use of probes labeled with fluores- 
cent or radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in single-round reverse-tran- 
SCTiption reactions in the presence of fluorescently 
tagged dUTP (e.g., ,Cy3-dUTP and Cy5-dUTP), which 
produces control and test products labeled with dif- 
ferent fluors. The cDNAs generated from these two 
populations, collectively termed the "probe," are then 
mixed and hybridized to the array under a glass cov- 
erslip [10,11,15]. The fluorescent signal is detected 
by using a custom-designed scanning con focal mi- 
croscope equipped with a motorized stage and lasers 
for fluor excitation [10,1 1,15]. The data are analyzed 
with custom digital image analysis software that de- 
termines for each DNA feature the ratio of fluor 1 to 
fluor 2, corrected for local background [16,17]. The 
strength of this approach lies in the ability to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick Brown and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
RNA [10], yeast genomic DNA [15], tumorigenic ver- 
sus non-tumorigenic human tumor cell hnes [11], 
human T-cells [18], yeast RNA [19], and human in- 
flammatory disease-related genes [20]. The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast Saccharomyces cervisiae [21]. 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support, al- 
beit at a lower density [7,22]. This method is useful 
for expression profiling and large-scale screening and 
mapping of genomic or cDNA clones [7,22-24]. In 
expression profiling on filter membranes, two dif- 
ferent membranes are used simultaneously for con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioactive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such cDNA 
membranes and software to analyze the image data 
[25-27]. 

Oligonucleotide Microarrays 

Oligonucleotide microarrays are constructed either 
by spotting prefabricated oligos on a glass support 
[13] or by the more elegant method of direct in situ 
oligo synthesis on the glass surface by photolithog- 
raphy [28-30]. The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics, pharmacogenetics, and sequencing by hybrid- 
ization as well as gene-expression analysis. 

Fabrication of oligonucleotide chips by photoli- 
thography is theoretically simple but technically 
complex [29,30]. The light from a high-intensity 
mercury lamp is directed through a photolitho- 
' graphic mask onto the silica surface, resulting in 
deprotection of the terminal nucleotides in the illu- 
minated regions. The entire chip is then reacted with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(where n = oligonucleotide length in bases) to syn- 
thesize a vast number of unique oligos, the total num- 
ber of which is limited only by the complexity of the 
photolithographic mask and the chip size [29,31,32]. 

Sample preparation involves the generation of 
double-stranded cDNA from cellular poly(A)+ RNA 
followed by antisense RNA synthesis in an in vitro 
transcription reaction with biotinylated or fluor- 
tagged nucleotides. The RNA probe is then frag- 
mented to facilitate hybridization. If the indirect 
visualization method is used, the chips are incubated 
with fluor-linked streptavidin (e.g., phycoerythrin) 
after hybridization [1 2,33]. The signal is detected with 
a custom confocal scanner [34]. This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing by hybrid- 
ization [28,36], and to evolutionary sequence com- 
parison of the BRCAl gene [37]. In addition, 
mutations in the cystic fibrosis [38] and BRCAl [39] 
gene products and polymorphisms in the human im- 
munodeficiency virus- 1 clade B protease gene [40] 
have been detected by this method. Oligonucleotide 
chips are also useful for expression monitoring [33] 
as has been demonstrated by the simultaneous evalu- 
ation of gene-expression patterns in nearly all open 
reading frames of the yeast strain S. cerevisiae [12]. 
More recently, oligonucleotide chips have been used 
to help identify single nucleotide polymorphisms in 
the human [41] and yeast [42] genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 

Screening for Mechanism of Artion 

The field of toxicology uses numerous in vivo 
model systems, including the rat, mouse, and rab- 
bit, to assess potential toxicity and these bioassays 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
niques have been developed to measure toxicity, 
many of which measure toxicant-induced DNA dam- 
age. Examples of these assays include the Ames test, 
the Syrian hamster embryo cell transformation as- 
say, micronucleus assays, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and many others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by, and results in, alterarions in gene expression. In 
many cases, these changes in gene expression are a 
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far more sensitive, characteristic, and measurable 
endpoint than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
ism after toxicant exposure is fundamentally infor- 
mative and complements the established methods 
described above. 

We are developing a method by which toxicants 
can be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
fined model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototypic class (e.g., polycyclic aromatic hydrocar- 
bons (PAHs)). Cells are then treated with these agents 
at a fixed toxicity level (as measured by cell survival), 
RNA is harv'ested, and toxicant-induced gene expres- 
sion changes are assessed by hybridization to a cDNA 
microarray chip (Figure 1). We have developed a cus- 
tom DNA chip, called ToxChip vl.O, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the test 
agents in the model systems are analyzed,' and the 
common set of changes unique to that class of toxi- 
cants, termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on rela- 
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tive fold induction or suppression of genes in treated 
samples versus untreated controls and selecting the., 
most consistently different signals across the sample 
set. A different signature may be established for each 
prototypic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced bv un- 
known agents in these same model systems can then 
be compared with the established signatures. A match 
assigns a putative mechanism of aaion to the test 
compound. Figure 2 illustrates this signature method 
for different types of oxidant stressors, PAHs, and 
peroxisome proliferators. In this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e^g., reveal that a com- 
pound has both PAH-like and oxidant-like proper- 
ties). In the future, it may be necessary to distinguish 
very subtle differences between compounds within 
a very large sample set (e.g., thousands of highly simi- 
lar structural isomers in a combinatorial chcmistrv 
library or peptide library). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal-component analysis can be used. 

For the studies outlined in Figure 2, we developed 
the custom cDNA microarray chip ToxChip vl.O. 

Treated 
Population 
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Figure 1. Simplified overview of the method for sample 
preparation and hybridization to cDNA microarrays. For illus- 



trative purposes, samples derived from cell culture are depicted 
although other sample types are amenable to this analysis. 
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Figure 2. Schematic representation of the method for iden- 
tification of a toxicant's mechanism of action. In this method, 
gene-expression data derived from exposure of model sys- 
tems to known toxicants are analyzed, and a set of changes 
characteristic to that type of toxicant (termed the toxicant 
signature) is identified. As depicted, oxidant stressors produce 



The 2090 human genes that comprise this subarray 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different types of toxic insult. Included 
on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and 
dioxin-like compounds, peroxisome proliferators, 
estrogenic compounds, and oxidant stress. Some of 
the other categories of genes include transcription 
factors, oncogenes, tumor suppressor genes, cyclins, 
kinases, phosphatases, cell adhesion and motility 
genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date, 
very few toxicants have been shown to have appre- 
ciable effects on the expression of these housekeep- 
ing genes. However, this housekeeping hst will be 
revised if new data warrant the addition or deletion 
of a particular gene. Table 1 contains a general de- 
scription of some of the different classes of genes 
that comprise ToxChip vl.O. 

When a toxicant signature is determined, the 
genes within this signature are flagged within the 
database. When uncharacterized toxicants are then 
screened, the data can be quickly reformatted so that 
blocks of genes representing the different signatures 



consistent changes in group A genes (indicated by red and 
green ctrcles), but not group B or C genes (indicated by gray 
circles). The set of gene-expression changes elicited by the 
suspected toxicant is then compared with these characteristic 
patterns, arrd a putative mechanism of action is assigned to 
the unknown agent. 



are displayed [11]. This facilitates rapid, visual in- 
terpretation of data. We are also developing Tox- 
Chip v2-0 and chips for other model systems, 
including rat, mouse, Xenopus, and yeast, for use in 
toxicology studies. 

Animal Models in Toxicology Testing 

The toxicology community relies heavily on the 
use of animals as model systems for toxicology test- 
ing. Unfortunately, these assays are inherently ex- 
pensive, require large numbers of animals and take a 
long time to complete and analyze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS). the National Toxicology Program, and the 
toxicology community at large are committed to re- 
ducing the number of animals used, by developing 
more efficient and alternative testing methodologies. 
Although substantia! progress has been made in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity, immunotoxicity, reproductive and developmen- 
tal toxicology, and genetic toxicology. The rodent 
cancer bioassay is a particularly expensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze [43]. In vitro experiments of the type outlined 
in Figure 2 might provide evidence that an unknown 
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Table 1. ToxChip vl.O: A Human cDNA Microarray 
Chip Designed to Detect Responses to Toxic Insult 

No. of genes 



Gene category on chip 



Apoptosis 72 

DIMA replication and repair 99 

Oxidative stress/redox homeostasis 90 

Peroxisome proliferator responsive 22 

Dioxin/PAH responsive 12 

Estrogen responsive 63 

Housekeeping 84 

Oncogenes and tumor suppressor genes 76 

Celt-cycle control 51 

Transcription factors 131 

Kinases 276 

Phosphatases 88 

Heat-shock proteins 23 

Receptors 349 

Cytochrome P450s 30 



*This list is intended as a general guide. The gene categories are not 
unique, and some genes are listed in nnuttiple categories. 

agent is {or is not) responsible for eliciting a given 
biological response. This information would help to 
select a bioassay more specifically suited to the agent 
in question or perhaps suggest that a bioassay is not 
necessary, which would dramatically reduce cost, 
animal use, and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassay and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
screened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often exquisitely sensitive to low 
doses of a toxicant, the combination of gene-expres- 
sion SCTeening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
mals. In addition, gene-expression changes are nor- 
mally measured in hours or days, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
investigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by studying the relationship 
between the duration of exposure to a toxicant and 
the gene-expression profile produced. Thus, a bio- 
assay that incorporates gene-expression signatures 
with traditional endpoints might be substantially 
shorter, use more realistic dose regimens, and cost 
substantially less than the current assays do. 

These considerations are also relevant for branches 
of toxicology not related to human health and not 
using rodents as model systems, such as aquatic toxi- 
cology and plant pathology. Bioassays based on the 
flathead minnow, Daphnia, and Arabadopsis could 
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also be improved by the addition of microanay analy- 
sis. The combination of microarrays with traditional 
bioassays might also be useful for investigating some " 
of the more intractable problems in toxicology re- 
search, such as the effects of complex mixtures and 
the difficulties in cross-species extrapolation. 

Exposure Assessment, Environmental Monitoring, 
and Drug Safety 

The currently used methods for assessment of ex- 
posure to chemical toxicants are based on measure- 
ment of tissue toxin levels or on surrogate markers 
of toxicity, termed biomarkers (e.g., peripheral blood 
levels of hepatic enzymes or DNA adducts). Because 
gene expression is a sensitive endpoint, gene expres- 
sion as measured with microarray technology may 
be useful as a new biomarker to more precisely iden- 
tify hazards and to assess exposure. Similarly, 
microarrays could be used in an environmental- 
monitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fashion, 
microarrays could be used to measure gcnc-cxprcs- 
sion endpoints in subjects in clinical trials. The com- 
bination of these gene-expression data and more 
established toxic endpoints in these trials could be 
used to define highly precise surrogates of safety. 

Gene-expression profiles in samples from exposed 
individuals could be compared to the profiles of the 
same individuals before exposure. From this infor- 
mation, the nature of the toxic exposure can be de- 
termined or a relative clinical safety factor estimated. 
In the future it may also be possible to estimate not 
only the nature but the dose of the toxicant for a 
given exposure, based on relative gene-expression 
levels. This general approach may be particularly 
appropriate for occupational-health applications, in 
which unexposed and . exposed samples from the 
same individuals may be obtainable. For example, 
a pilot study of gene expression in peripheral-blood 
lymphocytes of Polish coke-oven workers exposed 
to PAHs (and many other compounds) is under con- 
sideration at the NIEHS. An important consideration 
for these types of studies is that gene expression can 
be affected by numerous factors, including diet, 
health, and personal habits. To reduce the effects 
of these confounding factors, it may be necessary 
to compare pools of control samples with pools of 
treated samples. In the future it may be possible to 
compare exposed sample sets to a national database 
of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 
individual. Efforts to develop such a national gene- 
expression database are currently under way |44,45]. 
However, this national database approach will re- 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effects of environmental factors 
on this expression. 
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Alleles, Oligo Arrays, and Toxicogenetics 

Gene sequences vary between individuals, and 
this variability can be a causative factor in human 
diseases of environmental origin [46,47]. A new area 
of toxicology, termed toxicogenetics, was recently 
developed to study the relationship between genetic 
variability and toxicant susceptibility. This field is 
not the subject of this discussion, but it is worth- 
while to note that the ability of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair differences makes these arrays uniquely 
useful for this type of analysis. Recent reports dem- 
onstrated the feasibility of this approach [41,42]. 
The NIEHS has initiated the Environmental Genome 
Project to identify common sequence polymor- 
phisms in 200 genes thought to be involved in en- 
vironmental diseases [48]. In a pilot study on the 
feasibility of this application to the Environmental 
Genome Project, oligonucleotide arrays will be used 
to resequence 20 candidate genes. This toxicogenetic 
approach promises to dramatically improve our un- 
derstanding of interindividual variabilitv in disease 
susceptibility. 

FUTURE PRIORITIES 

There are many issues that must be addressed be- 
fore the full potential of microarrays in toxicology 
research can be realized. Among these are model sys- 
tem selection, dose selection, and the temporal na- 
ture of gene expression. In other words, in which 
species, at what dose, and at what time do we look 
for toxicant-induced gene expression? If human 
samples are analyzed, how variable is global gene 
expression between individuals, before and after toxi- 
cant exposure? What are the effects of age, diet, and 
other factors on this expression? Experience, in the 
form of large data sets of toxicant exposures, will 
answer these questions. 

One of the most pressing issues for array scientists 
is the construction of a national public database 
(linked to the existing public databases) to serve as a 
repository for gene-expression data. This relational 
database must be made available for public use, and 
researchers must be encouraged to submit their ex- 
pression data so that others may view and query the 
information. Researchers at the National Institutes 
of Health have made laudable progress in develop- 
ing the first generation of such a database [44,45]. In 
addition, improved statistical methods for gene clus- 
tering and pattern recognition are needed to ana- 
lyze the data in such a public database. 

The proliferation of different platforms and meth- 
ods for microarray hybridizations will improve 
sample handling and data collection and analysis and 
reduce costs. However, the variety of microarray 
methods available will create problems of data com- 
patibility between platfonns. In addition, the near- 
infinite variety of experimental conditions under 



which data will be collerted by different laborato- 
ries will make large-scale data analysis extremely dif- 
ficult. To help circumvent these future problems, a 
set of standards to be included on all platforms 
should be established. These standards would facili- 
tate data entry into the national database and serve 
as reference points for aoss-platform and inter-labo- 
ratory data analysis. 

Many issues remain to be resolved, but it is clear 
that new molecular techniques such as microarray 
hybridization will have a dramatic impaa on toxicol- 
ogy research. In the future, the information gathered 
from microan-ay-based hybridization experiments will 
form the basis for an improved method to assess the 
impact of chemicals on human and environmental 
health. 
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Abstract 

Recent progress in genomics and proteomics technologies has created a unique opportunity to significantly impact 
the pharmaceutical drug development processes. The perception that cells and whole organisms express specific 
inducible responses to stimuli such as drug treatment implies that unique expression patterns, molecular fingerprints, 
indicative of a drug's efficacy and potential toxicity are accessible. The integration into state-of-the-art toxicology of 
assays allowing one to profile treatment-related changes in gene expression patterns promises new insights into 
mechanisms of drug action and toxicity. The benefits will be improved lead selection, and optimized monitoring of 
drug efficacy and safety in pre-clinical and clinical studies based on biologically relevant tissue and surrogate markers. 
© 2000 Elsevier Science Ireland Ltd. All rights reserved. 
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1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aiming to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene expres- 
sion regulations account for both the 
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pharmacological action and the toxicity of a drug 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drug, a characteristic gene regula- 
tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxicity. 

Gene expression is a multistep process that 
results in an active protein (Fig. 1). There exist 
numerous regulation systems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further down-stream, creating a 
snapshot of gene regulation closer to ultimate cell 
function control. 



0378-4274/00/S - see front matter © 2000 Elsevier Science Ireland Ltd. All rights reserved. 
PIl: 50378-4274(99)00236-2 



468 



S. Sterner, N.L. Anderson /Toxicology Letters 112-113 (2000) 467-471 



2. Global mRNA profiling 

Expression data at the mRNA level can be 
produced using a set of different technologies 
such as DNA microarrays, reverse transcript 
imaging, amplified fragment length polymorphism 
(AFLP), serial analysis of gene expression 
(SAGE) and others. Currently, DNA microarrays 
are very popular and promise a great potential. 
On a typical array, each gene of interest is repre- 
sented either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion (PCR) and spotted on a suitable substrate 
using robotics (Schena et al., 1995; Shalon et ah, 
1996) or by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support using 
photolabile nucleotide chemistry (Fodor et al., 
1991; Chee et al., 1996). From control and treated 
tissues, total RNA or mRNA is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled 
probes are then hybridized to the arrays. The 
intensity of the array signal is measured for each 
gene transcript by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcript 
abundance. 



3. Cloba] protein profiling , 

Global quantitative expression analysis at the 
protein level is currently restricted to the use of 
two-dimensional gel electrophoresis. This tech- 
nique combines separation of tissue proteins by 
isoelectric focusing in the first dimension and by 
sodium dodecyl sulfate slab gel electrophoresis- 
based molecular weight separation on the second, 
orthogonal dimension (Anderson et ah, 1991). 
The product is a rectangular pattern of protein 
spots that are typically revealed by Coomassie 
Blue, silver or fluorescent staining (Fig. 2). 
Protein spots are identified by mass spectrometry 
following generation of peptide mass fingerprints 
(Mann et al., 1993) and sequence tags (Wilkins et 
al., 1996). Similar to the mRNA approach, the 
ratio between the optical density of spots from 
control and treated samples are compared to 
search for treatment-related changes. 

4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level. The 
overall objective, once a mass of high-quality 
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Fig. 1. Production of an active protein is a multistep process in which numerous regulation systems exert control at various stages 
of expression. Molecular fingerprints of drugs can be visualized through expression profiling at the mRNA level (genomics) usfng 
a variety of technologies and at the protein level (proteomics) using two-dimensional gel electrophoresis. 
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Fig. 2. Computerized representation of a Coomassie Blue stained two-dimensional gel electrophoresis pattern of Fischer F344 rat 
liver homogenate. 



quantitative expression data has been collected, is 
to visualize complex patterns of gene expression 
changes, to detect pathw^ays and sets of genes 
tightly correlated with treatment efficacy and toxi- 
city, and to compare the effects of different sets of 
treatment (Anderson et a]., 1996). As the drug 
effect database is growing, one may detect similar- 
ities and differences between the molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
refocus or extend the therapeutic spectrum of a 
drug candidate. 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protein expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
ples. The expression of such genes may be prefer- 
ably quantified at the mRNA level using 
techniques allowing PCR-mediated target amplifi- 
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cation. Tissue biopsy samples typically yield good 
quality of both mRNA and proteins; however, the 
quality of mRNA isolated from body fluids is 
often poor due to the faster degradation of 
mRNA when compared with proteins. RNA sam- 
ples from body fluids such as serum or urine are 
often not very 'meaningful', and secreted proteins 
are likely mere reliable surrogate markers for 
treatment efficacy and safety. Detection of post- 
translational modifications, events often related to 
function or nonfunction of a protein, is restricted 
to protein expression analysis and rarely can be 
predicted by mRNA profiling. Information on 
subcellular localization and translocation of 
proteins has to be acquired at the level of the 
protein in combination with sample prefractiona- 
tion procedures. The growing evidence of a poor 
correlation between mRNA and protein abun- 
dance (Anderson and Seilhamer, 1997) further 
suggests that the two approaches, mRNA and 
protein profiling, are complementary and should 
be applied in parallel. 

6. Expression profiling and drug development 

Understanding the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successful development of a drug. Mechanistic 
insights are essential for the interpretation of drug 
effects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (Richardson et 
a]., 1993; Steiner et ai.. 1996b; Aicher et al., 1998). 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et al., 1991, 
1995, 1996; Steiner et a). 1996a), Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
compound screening, enabling one to select drug 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmacological potency and low toxicity 
(Arce et al, 1998). In later phases of drug devel- 



opment, surrogate markers of treatment efficacy 
and toxicity can be applied to optimize the moni- 
toring of pre-clinical and clinical studies (Doherty 
et al., 1998). 



7, Perspectives 

Tlie basic methodology of safety evaluation has 
changed little during the past decades. Toxicity in 
laboratory animals has been evaluated primarily 
by using hematological, clinical chemistry and 
histological parameters as indicators of organ 
damage. The rapid progress in genomics and pro- 
teomics technologies creates a unique opportunitv 
to dramatically improve the predictive power of 
safety assessment and to accelerate the drug devel- 
opment process. Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drug can- 
didates with higher efficacy and lower toxicity. 
The identification of biologically relevant surro- 
gate- markers correlated with treatment efficacv 
and safety bears a great potential to optimize the 
monitoring of pre-clinical and clinical trails. 
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DNA amy technology' nukes h possible to rapidly genotype tndividuab or qiuirafy the apmsion 
of thousands of genes on a single filter or glass sUde. and holds enormous potentiaJ in toxicologic 
applications. This potential led to a U^. Environmental Protection Agency>spoiisored workshop 
tided "Application of Microarrays to Toxicology" on 7^ January 1999 in Rescardi Triangle Park, 
North Carolina. In addition to providing state-of-the-arx information on the application of DNA or 
gene microarrays. the workshop catalyzed the formation of several collaborations, comminees, and 
user's groups ihrougfaoui the Research Triangle Park area and beyond. Potential application of 
microamys to toxicologic rcieaxch and risk assessment include genome-wide expression analyses to 
identify gene-expression networks and toxicant- specific signatures that can be used to define mode 
of action, for exposure assessment, and for environmental monitoring. Arravs may also prove useful 
for monitoring genetic variability and its relatiofuhip to toxicant susceptibility in human popula- 
tions. Key words: DNA arrays, gene arrays, microarrays, toxicology. Environ Health Perspect 
107:681-685 (1999). (Online 6 July 1999) 
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Decoding the genetic blueprint is a dream that 
orters manifold returns in terms ot understand- 
ine hovt' oreariisms develop and tuncuon in an 
often hostile environment. With the rapid 
ad\'anccs in molecular biologx' over the last 30 
vears, the dream has come a step closer to rcali- 
rv. MoIccuJar biologists now have the ability to 
elucidate the composition of any genome. 
Indeed, almost 20 genomes have already been 
sequenced and more than 60 arc currently 
under way. Foremost arriong these is the 
Human Genome Mapping ProiecL However, 
the genomes of a number ot commonly used 
laboratorv* species arc also under intensive 
investigation, including yeast. Arabidopsis, 
maize, rice, zebra fish, mouse, rat. and dog. It 
is widely cxpcaed that the completion ot such 
programs will facilitate the development of 
many powerful new techniques and approach- 
es to uia£nosmg ana ircatmg geneucallv and 
cn\ironmcncaIly induced diseases which afnia 
mankind. However, the vast amount of dau 
being generated by genome mapping will 
require ne%^- high- throughput technologies to 
investigate die funaion of the millions of. new _ 
genes that are being reponed. Among the most 
widely heralded of the new functional 
genomics technologies are DNA arrays, which 
represent perhaps the most anticipated new 
molecular biology technique since polymerase 
chain reaction (PCR). 

Arrays enable the study of literally thou- 
sands of genes in a single experiment. The 
p>otenDal importance of arrays is enormous and 
has been highlighted by the recent publication 
of an entire Nature Gcnetia supplement dedi- 
cated to the technology (7). Despite this huge 
surge of interest. DNA arrays are still litde used 
and largely unprovcn. as dcmomtraicd by the 



has driven venture capitalists into a frenzy of 
investment and many new companies are 
springing up to claim a share of this rapidly 
developing market. 

The U.S. Environmental Protection 
Agency (EPA) is interested in applymg DNA 
array technology to ongoing toxicologic stud- 
ies. To learn more about the current state of 
the technolog)', the Rcproduaivc Toxicolog>' 
Division (RTD) of the National Health and 
Environmental Effects Research Laboratory 
(NHEERL; Research Triangle Park. NC) 
hosted a workshop on "Application of 
Microarrays to Toxicology" on 7-8 januan' 
1999 in Research Triangle Park. North 
Carolina. The workshop was organized bv 
David Dix, Roben Kavlock. and John Rocken 
of the RTD/NHEERL. Twentv-rwo intra- 
mural and extramural scientists from goxxm- 
ment. acaacmia. ana industrv* snarco inxorma- 
tion! data, and optnioru on the current and 
future appiicauons tor this exciting nrw tech- 
nology. The workshop had more than 1 30 
anendccs. including researchers, students, and 
-administrators from the EPA, thcJ^a.tional 
Institute of Environmental Health Sciences 
(NIEHS). and a number of other establish- 
ments from Research Triangle Park and 
beyond. Presentations ranged from the tech- 
nology behind array production through the 
sharing of actual experimental daa and projec- 
tions on the future imponance and applica- 
tions of arrays. The information contained in 
the workshop presentations should provide aid 
and insight into arrays in general and their 
application to toxicology in particular. 

Array Elements 

In the context of molecular biolog)'. the word 



a regular pattern to some kind of supoortivc 
medium. DN.-^ arrav is often used inter- 
changeablv with gene arrav or microarray. 
.\lt hough not tor mail v denned, microarrav is 
generally used to describe the higher densirv 
arrays r\*pically printed on glass chips. The 
DN.A elements that make up DNA arravs 
can be oligonucleotides, partial cene 
sequences, or full-iength cDN.\s. Companies 
ottering pre-madc arrays that contain less 
than full-length clones normallv use resions 
ot the genes which arc specific to that gene to 
prevent false positives arising through cross- 
hybridization. Sequence verification of 
cDNA clone identity- is necessarv because of 
errors in identif\*ing specific clones from 
cDN.-K libraries and databases. Premade 
DNA arrays printed on membranes are cur- 
rently or imminently available for human, 
mouse, and rat. In most cases thev contain 
DNA sequences representing several thou- 
sand different sequence clusters or genes as 
delineated through the National Center for 
Biotechnology Information UniGcne ProjcCT 
[2). Many of these different UniGcne clusters 
(putative genes) are represented only by 
expressed sequence tags (ESTs). 

Array Printing 

Arrays are rv'pically printed on one of two 
f\'pcs ot support matrix. Nylon membranes 
are used by most otf-thc-shelf array providers 
such as Clontech Laboratories. Inc. 
(Palo Alto, G\). Genome Systems, Inc. (Si. 
Louis. MO), and Research Genetics. Inc. 
iHuntsville. .ALL .Microarravs such as those 
proouced by .AmTnetrix. inc. t Santa Clara. 
C\ ). I note Pharmaccuucals. Inc. (Polo Alto. 
CVj. and many do-it-yourself (DFY) arraying 
groups use glass waters or slides. Although 
standard microscope slides may be used, they 
must be preprepared to facilitate sticking 
of the DNA to the glass. Several different 
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coatings have been successfully used, includ- 
ing siUnc and lysine. The coating of slides 
can easilv be carried out in the laboratorv. 
but manv prefer the convenience or precoated 
slides a\-ailable from suppliers. 

Once the support matrix has been pre- 
pared, the DNA elements can be applied by 
several methods. Affymcmx. Inc., has devel- 
oped a unique photolithographic tcchnolog)' 
for artaching oligonucleotides to glass wafers. 
More commonly. DNA is applied by either 
noncontact or conuct printing. Noncontaci 
printers can use thermal, solenoid, or piezoelec- 
tric technology' to spray aiiquots of solution 
onto the suppon matrix and may be used to 
produce slide or membrane-based arrays. 
Cancsian Technologies. Inc. (Irvine^ CA) has 
developed nQUAD techno log\* for use in its 
PixSys printers. The system couples a syringe 
pump with the microsolenoid valyx, a combi- 
nation that provides rapid quanciutivc dispers- 
ing of nanoliter volumes (down to 4.2 nL) over 
a variable volume range. A different approach 
to noncontaa printing uses a solid pin and nng 
combination (Genetic MicroSystcms, Inc., 
Wobum, .MA). This system (Figure 1) allows a 
broader range of sample, including cell suspen- 
sions and particulates, because the printing 
head cannot be blocked up in the same way as 
a spray nozzle. Fluid transfer is controlled in 
this system primarily by the pin dimensions 
and the force of deposition, although the 
nature of the suppon matrix and the sample 
will also affea transfer to some degree. 

In contaa printing, the pin head is dipped 
in the sample and then touched to the support 
matrix to deposit a small aliquot. Split pins 
were one of the first conua-printing devices 
to be reported and are the suggested tormat 
for DIY arraycrs. as described by Brown (5). 
Split pins are small metal pins with a precise 
groove cut vertically in the middle of the pin 
tip. In this system. 1-48 split pins are posi- 
tioned in the pin-hcad The split pins work bv 
simple capillari' action, not unlike a fountain 
pen — when the pin heads are dipped in the 
sample, liquid is drawn into die pin groove. A 
small (fixed) volume is then deposited each 
lime the split pins are gently touched to 
the support matrix. Sample (100-300 pL 
depending on a variety of paramercrs) can be 
deposited on muluple slides before refilling is 
required, and array densities of > 2.500 
spou/cm- may be produced. The deposit vol- 
ume depends on the split size, sample fluidi- 

and the speed of printing. Split pins are 
relatively simple to produce and can be made 
in-house if a suitable machine shop is avail- 
able. Alternatively, they can be obtained 
direcdy from companies such as TeleChem 
International. Inc. (Sunnyvale, CA). 

Irrespective of their source, printers 
should be run through a preprint sequence 
orior to oroducine, the actual exoerimcnial 



arra\*5: the first 100 or so spots of a ncv*.' run 
tend to be somc>A*hat variable. Factors cffea- 
ing spot rcproducibiiiri- include slide treat- 
ment homogeneirt-. sample diricrences. and 
instrument errors. Other factors that come 
into play include clean ejection of the drop 
and clogging (nQUAD printing* and 
mechanical variations and long-term alter- 
arion in print-head surface of solid and split 
pins. However, with careful preparation it is 
possible to get a coctticient of variance for 
spot reproducibility below 10%. 

One potcnual printing problem is sample 
carn-'over. Pvepeatcd washi.ng, blottin?. and 
drying (vacuum) of pnnt pins befwecn samples 
is normally effccrive at reducing sample canv- 
over to negligible amounts. Prinung should 
also be carried out in a controlled environ- 
ment. Humidified chambers are available in 
which to place printers. These help prevent 
dust contamination and produce a uniform 
drying rate, which is important in determining 
spot size. qualit>*. and rcproducibilirw 

In summarv-. although several printing 
technologies are available, none are par- 
ticularly outstanding and the bottom line 
is that they are still in a reiativelv carlv stage 
of evolution. 

Array Hybridization 

The hybridization protocol is. practically 
speaking, relatively straightforward and those 
with previous experience in blotting should 
have little difficulry. Array hybridizations 
are, in essence, reverse Southern/Northern 
blocs — instead of applying a labeled probe to 
the target population of DNA/RNA. the 
labeled popuiarion is applied to the probeis). 
Wch membrane-based arrays, the control and 
treated mRNA populations are normallv con- 
vened to cDNA and labeled with isotope (e.g.. 
-P) in the process. These labeled poouiations 
arc men nyorioizca indeDcnocntiv to oarailei 
or senai arrays and the hvbndizauon smnai is 
detected with a phosporimagcr. A less com- 
monly used alternative to radioaaivc probes is 
enzymatic detection. The probe mav be 
bioiinylaied, haptenylated^r have alkaline 
phosphaiasc/horseradish peroxidase aaached. 
Hvbridizarion is deteaed by cnrymatic reac- 
tion yielding a color reaaion {4), Differences 
in hybridization signals can be deteaed by eye 
or, more accurately, with the help of digital 
imaging and commercially available software. 
The labeling of the test populations for slide- 
based microarrays uses a slightly different 
approach. The probe typically consists of two 
samples of polyA* RNA (usually from a created 
and a control population) chat arc convened ^o 
cDNA; in the process each is labeled wich a 
difFercni fluor. The independently labeled 
proba arc then mixed together and hybridized 
to a single microarray slide and the resulting 
combined fluprocent iimal u; v-ir-rrf A*^.- 
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Figure 1. Genetic Microsystems (Woburn. MA) pin 
ring system tor printing arrays. The pm nng com- 
bination consists of a circular open rmg oriented 
parallel to The sample sotution. with a vertical pin 
centered over the ring. When the rmg is dipped 
into a solution and Ithed. ii wrthdraws an atiquot 
of sample held by surface tension. To spot the 
sample, the pin is driven down through the ring 
and a poaion of the sotution is transferred to the 
bonom of the ptn. The pin continues to move 
downward until the pendant drop of solution 
makes contact with the underlying surface. The 
pin is then lifted, and gravity and surface tension 
cause deposition of the spot onto the array. 
Figure from Flowers et al. ( 14). with permission 
from Genetic Microsystems. 

normalization, it is possible to determine the 
ratio ot fluorescent signals from a single 
hybridization ot a slide-based microarrav. 

cDNA derived from control and treated 
populations ot RNA is most commonly 
hybridized to arrays, although subtractive 
hybridization or differential display rcacrions 
mav also be used. Fluorophore- or radiola- 
bcied nucieouces are dircctiv incorporated 
into the cDNA m the process of convening 
RNA to cDNA. Aitcmativeiy. 5' cnd-labdcd 
primers may be used for cDNA synthesis. 
These are labeled with a fluorophore for 
direct visualization of the hybridized array. 
Alternatively* biotin or a hapten may be 
attached to the primer, in which case fluor- 
labeled streptavidin or antibody must be 
applied before a signal can be generated. The 
most commonly used fluorophores at present 
are cyanine (Cy)3 and Cy5 (Amcrsham 
Pharmacia Biotech AB. Uppsala, Sweden). 
However, the relative expense of these fluo- 
rescent conjugates has driven a search for 
cheaper alternatives. Fluorescein, rhodamine, 
and Texas red have all been used, and 
companies such as Molecular Probes. Inc. 
(Eugene. OR) are developing a series of 
labeled nucleotides wich a wide range of ocd- 
tation and emission speara which may prove 



Ttbte 1. Advantages and disadvantages of different microarray scanning svstems. 
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Analysis of DNA Microarrays 

Mcmbranc-bascd amys arc normally analyzed 
on film or with a phosphorimagcr. whereas 
chip-based arrays require more spedaiizcd scan- 
ning devices. These can be divided into three 
main eroups: the charge-coupled device camera 
svstems. the nonconrocal laser scanners, and the 
coniboi laser scanners. The ad\"antages and dis- 
advancaces of each system arc listed in Table 1. 

Because a typical spot on a microarray can 
contain > 10^ molecules, it is dear that a large 
variation in signal strength may occur. 
Current scanners cannot work across this 
many orders of magnitude (4 or 5 is more typ- 
ical). Howc\'cr. the scanning parameters can 
normally be adjusted to collect more or less 
signal, such that two or three icsjis of the same 
array should permit the detection ot rare and 
abundant genes. 

^"hen a microarray is scanned, the fluores- 
cent images arc captured by sorrware normally 
included vvith the scanner. Soxral commercial 
suppliers provide additional sofrware for quan- 
tih'ing array images, but the sotrware tools are 
constantly evolving to meet the developing 
needs of researchers, and it is prudent to 
define one s own needs and clarify* the exact 
capabilities of the sofrware before its purchase. 
Issues that should be considered include the 
following: 

• Can the software locate offset spots? 

• Can it quantiiate across irregular hx-bridiza- 
tion signalsr 

• Can the arrayed genes be programmed in for 
cas\' identification and location.* 

• Can the software connea ^ia the Internet to 
databases containing further intormation on 
the genets) of interest? 

One of the kc\' issues raised at the work- 
shop vvas the scnsiiiviry of microanay tcchnol- 
o£y, ExDcrimcnts bv General Scanning, Inc. 

atcriDwn. MA), harvc shown dut by using 
the Cy dyes and their scanner, sigzul can be 
detected down to Icvcb of < 1 fluor molecule 
per square micrometer* which translates to 
detecting a rare message at approximately one 
copy per cell or less. 

Array Applications 

.AJ though arrays arc an emerging technology 
certain to undergo improvement and 
altcradon, they have already been applied use- 
fully to a number of model systems. Arrays arc 
at their most powerful when they contain the 
entire genome of the species they arc being 
used to study. For this reason, they have strong 
suppon aihong researchers utilizing yeast and 
Ciicnorhabditii eUgans (5). The genomes of 
both of these species have been sequenced and, 
in the case of yeast, deposited onto arrays for 



CCD. cnarge-cousiea oevice. 
Prom Kawasaki (JJ). 

elegans knockouts can be made simply by 
soaking the worms in an antiscnse solution of 
the gene to be knocked out. 

By a process of systematic gene disrup- 
tion, it is now possible to examine the cause 
and effect relationships between diftercnr 
genes in these simple organisms. This kind of 
approach should help elucidate biochemical 
pathways and genetic control processes, 
deconvolute polygenic interactions, and 
define the architecture of the cellular network. 
A simple case study of how this can be 
achic\'cd was presented by Butow [Univcrsirv 
of Texas Southwestern Medical Center. 
Dallas. TX (Figure 2)]. Although it is the 
phenotypic result of a single gene knockout 
that is being examined, the effect of such 
perturbation will almost al%*ays be polygenic. 
Polygenic interactioru will become increasing- 
ly important as researchers begin to move 
away trom single gene systems when examin- 
ing the nature of toxicologic responses to 
external stimuli. This is especially imponant 
in toxicolog)' because the phcnor\ pe pro- 
duced by a given environmental insult is 
never the result of the action of a single gene; 
rather, it is a complex interaction of one or 
muluple cellular pathways. Phenomena such 
as quantintive crajt (the continuous variarior. 
of phaiotype). epistasis ithe cnec: of aiicies of 
one or more genes on the expression or otner 
genes), and penetrance iproponion of indi- 
viduals of a given genotype that display a par- 
cictilar phenocype) will become increasingly 
evident and important as toxicologists push 
toward the ultimate goal of matching the 
responses of individuals to different 
cnvironmencal stimuli. 

Analysis of the transcriptome (the expres- 
sion ievd of all the genes in a given cell popula- 
tion) was a use of arrays addressed by several 
speakers. Unfortunately, current gene nomen- 
clature is often confusing in that single genes 
are allocated multiple names (usually as a result 
of independent discovery by different bborato- 
ries), and there was a call for standardization of 
gene nomenclature. Nevertheless, once a tran- 
scriptome has been assembled it can then be 



transcriptomes for human, rat. and mouse. In a 
slightly diitcrcnt approach. Nu\^-a\-5ir et ai. l5i 
describes how the NIEHS assembled what is 
enPcciively a "toxico logical transcnptome" — a 
library of human and mouse genes that have 
previously been proven or implicated in 
responses to toxicologic insults. Clontech 
Laboratories. Inc. (Palo .Alto. CM, has begun a 
similar process by-dc\*eloping stress/ toxicology 
filter arrays ot rat. mouse, and human genes. 
Thus, rather than being tissue or cell specific, 
these stress/ toxicology* arra\s can be used across 
a varicc)' oi model systems to look tor alter- 
ations in the expression of toxicologically 
important genes and dehne the new field of 
toxicogenomics. The potential to identify toxi- 
cant families based on tissue- or celt-specific 
gene expression could revolutionize drug test- 
ing. These molecular signatures or fingerprints 
could not only point to the possible 
toxicity/ carcinogenicity of newly discovered 
comp>ounds (Figure 5), but also aid in elucidat- 
ing their mechanism of aaion through identifi- 
cation of gene expression networks. By exierv 
sion. such signatures could provide easily idcr>- 
tifiable biomarkcrs to assess the degree, time, 
and nature of exposure. 

DNA arra\'5 are pnmarily a tool for exam- 
ining dirrcrential gene expression in a given 
model. In this context thc\* are rcirncd to as 
dosed svstems because thc%' lack the ability of 
other differential expression technologies, e.g., 
differential display and subtractive hybridiza- 
tion, to detea previously unknown genes not 
present on the array. This would appear to 
limit the power of DNA arrays to the imagina- 
tions and preconceptions of the researcher in 
selecting genes previously characterized and 
thought to be involved in the model system. 
However, the various genome sequencing pro- 
jects have created a new category of 
sequence — the EST — that has partially molli- 
fied this deficiency. ESTs are cDNAs expressed 
in a given tissue that, although they may share 
some degree of sequence similarity to previous- 
ly charaacrized gcno, have not been assigned 
specific genetic identii)*. By incorporating EST 
clones into an array, it is possible to monitor 



signirlcance in the modci s\'5tcm. Fiitcr arran 
from Research Genetics and slide amx-s rrom 
Incvie Pharmaceuticals both incortx)raie la/sc 
numbers or ESTs rrom a x-ancn* or species. 

A rurther use of microarra^'s is the idenrin- 
cation of single nucleotide polymorphisms 
I'SNTs). These genomic variations arc abun- 
dant — thc%* occur appruximatcly c^'e^^• I kb or 
so — and arc the basis of restriction fraEment 
Icngih poi\*morphism anaiv-sis used in foreruic 
anaiysis. .Aln'mctrix. Inc.. designed chips chat 
contain multiple repcars of the same gene 
sequence. Each position is present with aJl four 
possible bases. After the hybridization of the 
sample, the degree of h\'bridizaiion ro the dif- 
tereni sequences can be measured and the cxaa 
sequence of the target gene deduced. SNPs are 
thought to be or vital importance in drug 
metabolism and toxjcoiog;\'. For example, sin- 
gle base difterences in the regulator^- region or 
active sire of some genes can account for huge 
differences in the activirt- of chat gene. Such 
SNPs are thought to explain why some people 
are able to metabolize certain xcnobiotics bet- 
ter than others. Thus. arra\3 provide a further 
tool tor the toxicologist investigatinc the 
nature ot susceptible subpopulaiions and toxi- 
cologic response. 

There are still nruny \^TinkJcs to be ironed 
out before arrax's become a standard tool for 
toxicologists. The main issues raised at the 
workshop by those with hands-on experience 
were the following: 

• Expense: the cost ot purchasing/con traaino 
this technology is still too great for manv 
individual laboratories. 






Figure 2. Potential effects of gene knockout within 
positively aniJ negatively regulated gene expre$s»on 
networks. is limiting in wiW type for expression of 
ij. \A] A simple, two-component, linear regulaiorv 
network operating on gene ^, where /, is a positive 
effector of ^ and is either a positive or negative 
effector of This network could be deduced by 
examining the consequence of iB) delenng on the 
expression of /, and i^, where the expression of ^ 
would be decreased or increased depending on 
whether was a positive or negative regulator. 
These and other connected components of even 
greater compiexrty could be revealed by genome- 
wide expression anaJysis. From Butow ( /5) ~ 



• Clones: the logisncs of idcntirnng. obujning. 
and maintaining a set of nonredundant. non- 
contaminated, sequcnce-venhcd. species/ccli- 
ussuc/ricld-specific clones. 

• Use of mbrcd snains: x^-here whole-organism 
models are being used, the use of inbred 
scrams is imporunt to reduce the potcncialiv 
confusing effects of the individual \-ariation 
t>picali 

y seen in outbred populations. 

• Probe: ihc need for rclati\'ely Urge amounts 
of RNA. which limits the r>-pe of sample 
(e.g.. biops\') chat can be used. .Also, different 
RN'A extraction methods can give diftcrcnt 
resuics- 

• Specificm-: the abilic>' to discriminate accu- 
rately between closeiv related genes (e.g.. the 
c>Tochrome p450 family) and splice \-arianis. 

•Quantitation: the quantitation of gene 
expression using gene arrays is still open to 
debate. One reason for this is the different 
incorporation of the bbeling dvcs. Howrver. 
the main difficult)- lies in knowing what to 
normalize agajnsc. One option is to include a 
large number of so-called housekeeping genes 
m the array. However, the expression of'thcse 
genes often change depending on the tissue 
and the toxicant, so it is necessari- to charac- 
terize the expression of these genes in the 
model s>-stem before utilizing them. This is 
clearlv not a viable option when screening 
multiple new compounds. A second option 
is to include on the array genes from a nonre- 
latcd species (e.g.. a plant gene on an animal 
array) and to spike die probe with s>-nthetic 
RNA(s) complcmentan- to the sene{s). 

• Reproducibilit)-: this is sometimes question- 
able, and a figure of approximately two or 
three repeats ^^•as used as the minimum num- 
ber required to confirm initial findinss. 



.Agajn. howncr. most peori- iJ\l^^-^: 
use or Northern biots or r^\ersc can^c'^iai- 
PCR to conn.Tn nnaincs. 

• Sensitivit}-: concerns wcrr voiceJ aboL:: :nr 
"number of target molecules that must be pre- 
sent in a sample ror chcm to be detected on 
the array. 

• Efnciehc)- reproducible identirlcauon of l.S- 
to 2-told ditTerences in expression rrpon- 
ed. although the number of genes that 
undergo this level of change and remain 
undetected is open to debate.^ It is important 
that this level of detection be ultimjteiv 
-w..iw.^^ ^w^wS^ ,i 4S wommoniy pcrcLivcti 
that some important transcription factors 
and their regulators respond at such low i^. 
eis- In most cases. 3- to ^-fold was the mini- 
mum change that most were happv to 
accept. 

• Bioinformatics: perhaps the greatest concern 
v^-as how CO accurately interpret the dau with 
the greatest accuracy and erVicicncy. The 
biggest headache is trving to identify* nct- 
worb of gene expression that are common to 
different treatments or doses. The amount of 
data from j single experiment is huge. It mav 
be chat, in the future, several groups individ- 
uallv equipped with specialized soirware algo- 
rithms for studving their favorite genes or 
gene systems will he able to share die same 
hybridized chips. Thus, arravs could usher in 
a ^c^^• perspective on collaboration and the 
sharing of data. 

EPAMAC 

Perhaps the main reason most scientists arc 
unable to use array technolog>- is the high cost 
involved, whether buying off-the-shelf mem- 
branes, using contract printing services, or 
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producins chips in-housc. in view o: this, 
researchers at the RTD/NHEERi initiated 
the EPAMAC- This consortium brmss 
loecchcr scienusts nom the EPA and a num- 
ber of cxrramurJ labs with the aim of de^'cl- 
opin£ microamv capabilirv through the shar- 
ing of resources and data. EPAMAC 
researchers are primarily interested in the 
dcvclopmentaJ and toxicologic changes seen 
in testicular and breast tissue, and a ponion 
of the workshop was set aside for EP.AMAC 
members to share their ideas on how the 
experimental application of microarrays could 
facilitate their research. One of the central 
areas of interest to EP.AMAC members is the 
effect of xenobiotics on male fertilit>* and 
reproductive health. Of greatest concern is 
the effect of exposure during critical periods 
of do'elopmcnt and germ cell differentiation 
(9), and how this may compromise sperm 
counts and Quaiit>- following sexual matura- 
tion {JO), As well as spermatogcnic tissue, 
there is also interest in how residual mRNA 
found in mature sperm {JD could be used as 
an indicator of pre\-ious xenobiotic effects (it 
is easier to obtain a semen sample than a tes- 
ticular biopsv). .Arrays will be used to examine 
and compare the effect of exposure to heat 
and chemicals in testicular and cpididymal 
scne expression profiles, with the aim of 
establishing relationships/associations 
between changes in developmental landmarks 
and the effects on sperm count and qualirv*. 
Cluster, pattern, and other analysis of such 
data should help idcntiR* hidden relationships 
between genes that may reveal potential 
mechanisms of action and uncover roles tor 
scncs with unknown functions. 

Summary 

The Kill impact of DNA arrays mav not be 
-cen for sc\'eral vcar^. but the interest shown at 
:nis rcnonai workshop inojcarrs the high lc\'ci 
of interest thai thc\' roster. .\pan from aiucai- 
mg and advenisine the %'ahoas technologies in 
[his field, this workshop brought together a 
number of researchers from the Research 
Triangle Park area who arc already using DNA 
arrays. The interest in sharing ideas and experi- 
ences led to the initiation of a Triangle array 
user s group. 



.Array technology' is still in its infano*. This 
means that the hardware is still improving and 
there is no current consensus tor standard pro- 
cedures, quantitation, and interprexation. 
Consisicnc\' in sporring and scanning array's is - 
not yet optimized, and this is one or the most 

critical requirements of any experiment. In 

addition, one of the dark regions ot array tech- 
nologv* — strife in the couns over who owns 
what ponions of it — has further muddled the 
ruturc and is a potential barrier toward the 
development of consensus procedures. 

Perhaps the greatest hurdle tor the applica- 
tion of arrays is the actual interpretation of 
data. No specialists in bioinformatics anended 
the workshop, largely because thc\* are rare and 
because as yet no one seems clear on the best 
method of approaching data analysis and inter- 
pretation. Cross-referencing results from mul- 
tiple experiments (time, dose, repeats, different 
animals, different species) to identify- common- 
ly expressed genes is a great challenge. In most 
cases, we are still a long way from understand- 
ing how the expression ot gene X is rebted to 
the expression of gene }* and ordering gene 
expression to delineate causal relationships. 

To the ordinary' scientist in the typical lab- 
oratory-, howc\'er. the most immediate prob- 
lem is a lack of affordable instrumentation. 
One can purchase prcmadc membranes at 
relatively affordable prices. .Mthough these 
may be usehil in identifying individual genes 
to pursue in more detail using other methods, 
the numbers that would be required for e\*en a 
small routine toxicologv* experiment prohibit 
this as a truly viable approach. For the toxicol- 
ogist. there is a need to earn* out multiple 
experiments — dose responses, time curves, 
multiple animals, and repeats. Glass-based 
DNA arravs are most attraaive in this context 
because they* con be prepared in large batches 
from the same DNA source and accommo- 
date control and treated samoics on the same 
chip. Another problem with current off-the- 
shelf arrays is that they' often do not contain 
one or more of the particular genes a group is 
interested in. One alternative is to obtain 
and/or produce a set of custom clones and 
have contraa printing of membranes or slides 
carried out by a company such as Genomic 
Solutions, Inc (Ann Arbor, MI). TKis approach 
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is less expensive than iavinc: ou: ,.Mr::-. : " 

one s own entire svstem. airhouiir: a: >orr.;' 

point it might maice economic sense ;o pr:r.: 

one s owD arrays. 

Finally. DNA array's are currently a team 

effort. They are a technology that uses j wiue 

..range of skills including engineering, stattsucs. 

molecular bioio^^^ chemistry*, and bioinlor- 

matics. Because most mdjyiduals are skilled in 

only one or perhaps two of these areas, u 

appears that su»:cess with arrays may be best 

expected by teams ot collaborators consisting 

ot individuals having each or these skills. 

Those considering array applications mav 

be amused or soaded on bv the followinc 
- *" , ■ ^ 

quote from Fom^nf magazine \ IJ^.: 

Microproccssori have reshaped our economy, 
spaw-ncu vast fo mines and chanced the uav %*e live. 
Gene chips couid be cN-cn bigger. 

Although this comment may have been 
designed to excite the imagination rather than 
accurately rcflea the truth, it is fair to say thai 
the age of functional genomics is upon u$. 
DNA array-s look set to be an important tool in 
this new age of biotechnology* and will likely 
contribute answers to some of toxicoiog>''s 
most fundamental quest ioru. 
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Subject: RE: [Fwd: Toxicology Chip] 
Date: Mon. 3 Jul 2000 08:09:45 -0400 
From: "Afshari-Cynihia" <af5hari(S'niehs.nih.gov> 
To: "'Diana Hamlci-Cox"' <dianahc(i'incvie.com> 



Vou car. see z'p.e Ilsz of clones rha- we have or. our IZY. cr.ir at 

r.zzz: : rtar.uel .niehs .r,:.r.. co" r^ps cuesr cl or.es rc r. . rrr. 

We selected a suoset or genes (2000K) tnar we oe-ieved crit:.c£' - 

response and basic cellular processes and added a set c: clones - 

this. We have included a set of control genes (80-) that "we're selertei'r- 

the because they did not change across a large set of arra" 

experi.T.ents. However, we have found that so.Tie of these'genes" cnance 

signficantly after tox treatments and are m tine process' of loo'nino'at t:-e 

variation of each of these 80* genes across our experiments. ' 

Our chips are constantly changing and being updated and we hope that cu- 

data will lead us to what the toxchip should really be. 

I hope this answers your question. 

Cindy .-.fshari 



>.From: Diana Hamlez-Cox - 

> Ser.z: Monday, June 26. 2000 8:52 PM 

> To: afsharlQniehs .nih.gov 

> Subjecz: [Fwd: Toxicology Chip] 
> 

> Dear Dr , Afshari, 
> 

> Since I have noz yez had a response from Bill Grigg, perhaps he was -o' 

> zhe righz person zo conzacz. 
> 

> Can you help me in this matter? I don't need co know zhe sequences 

> necessarily, buz I would like very much to know whaz zypes of sequences 

> are oeing used, e.g., GPCRs (more specific?}, ion channels, etc 
> 

> Diana Hamlez-Cox 
> 

> Original Message 

> Subject: Toxicology Chip 

> Daze: Mon. 19 Jun 2000 18:31:48 -0700 

> From: Diana Hamlet-Cox <dianahc&incyze . com> 

> Organization: Incyze Pharmaceuticals 

> To: grigg&niehs .nih.gov 
> 

> Dear Colleague: 
> 

> I am doing lizerature research on zhe use of expressed genes as 

> pharmacozoxicology markers, and found zhe Press Release' dazed Februa-'y 

> 29. 2000 regarding the work of the NIZSiS in zhis area. I would like zo 

> know if there is a resource I can access ior you could provide?) zhaz 

> would give me a list of the 12,000 genes that are on your Human ToxChio 

> Microarray. In particular, I am interested m zhe crizeria used zo 

> select sequences for the ToxChip, including any control sequences 

> included in zhe microarray. 



rhank you for your assistance in zhis requesz. 



> Diana Hamlet-Cox, Ph,D. 

> Incyte Genom.ics, Inc. 



> This email message is for zhe sole use of rhe :.rzer.dsz rer^r^e-r s s~ 

> may corzair, czr.ziaer.zi&l ar.d privileged ir.fcrr^zior. sisb^.ecz r r 

> azzcmey^cliezz privilege . Ary iir.a'jzhorired review, irse. disclosure cr 

> diszrib'^ziar. is prolzibized. If vol* are r:oz zhe ir.zended rericier.z . 

> please cor.zarz zhe sezder by reply er^^il ar.d deszroy all rrpies cf z.ie 

> crizir.al message, 

> 
> 
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These data were digitally recorded by stand- 
alone units [borrowed from Incorporated Re- 
search Institutions for Seismology (IRIS), Wash- 
ington, DC] arranged in arrays 25 km long with 
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ments, Houston, TX) gravimeter and tied to the 
McMurdo gravity base station. The observed 
gravity anomaly was found to be within 0.5 mgal 
of that from a previous regional survey (2). Rela- 
tive elevations along the profile were measured by 
pairs of barometers. 

21 . Holes were drilled by melting of the ice with hot 
water pumped under pressure. For the multichan- 
nel reflection work, 7.5-kg charges were spaced 
every 200 m along the profile. Two charges 1 .6 
km apart were detonated for each streamer loca- 
tion. This pattern resulted in an effective 120- 
channel receiving array 3 km long. The sources 
for the wide-angle reflection and refraction data 
were several explosions of 100 to 400 kg. 
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43. The BSR here shares similar characteristics with 
BSRs identified artd drilled in low-latitude margins: It 
simulates the sea floor, cuts across stratigraphic 
units, exhibits a polarity reversal of the wavelet 
compared to the sea floor, and is overlain by an 
acoustically opaque layer. Bottom-simulating reflec- 
tors are caused by a thin layer of free gas at the 
base of a gas hydrate-cemented layer. Methane 
gas has been encountered in drill holes and cores 
around Antarctica, but despite the anomalously 
large depth of the Antarctic shelves, BSRs have 
rarely been documented [K. A. Kvenvolden, M. 
Golan-Bac, J. B. Rapp, in The Antarctic Continental 
Margin: Geology and Geophysics ofOffsttore Wilkes 



l^d, S. L Eittreim and M. A. Hampton, Eds, 
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Three-Dimensiona! Structure of 
Myosin Subf ragment- 1 : 
A Molecular Motor 

Ivan Payment,* Wojciech R. Rypniewski,t Karen Schmidt-Base,$ 
Robert Smith, Diana R. Tomchick,§ Matthew M. Benning, 
Donald A. Winkelmann, Gary Wesenberg, Hazel M. Holden 

Directed movement is a characteristic of many living organisms and occurs as a result of 
the transformation of chemical energy into mechanical energy. Myosin is one of three 
families of molecular motors that are responsible for cellular motility. The three-dimensional 
Structure of the head portion of myosin, or subfragment-1 , which contains both the actin 
and nucleotide binding sites, is described. This structure of a molecular motor was de- 
termined by single crystal x-ray diffraction. The data provide a structural framework for 
understanding the molecular basis of motility. 



M otilit^ is one of the characteristic fea- 
tures of many living organisms and involves 
the transduction of chemical into mechan- 
ical energy. Only a limited number of strat- 
egies have evolved to accomplish this task. 
At present, three major classes of molecular 
motors have been identified, myosin, dy- 
nein, and kinesin, and all are important in 
cellular movement (1). Of these three pro- 
teins, the most abundant is myosin, which 
plays both a structural and an enzymatic 
role in both muscle conaaction and intra- 
cellular motility. 
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The role of myosin in movement has 
been most clearly defined from the study of 
cross-striated skeletal muscle, which shows 
a high degree of structural organization. In 
striated muscle the basic contractile unit is 
the sarcomere, which consists of overlap- 
ping arrays of thick and thin filaments. 
During contraction, these filaments, which 
are composed primarily of myosin and ac- 
tin, respectively, slide past one another, 
thereby shortening the length of the sarco- 
mere (2) . Electron micrographs of muscle 
in rigor have revealed connections between 
the filaments irl the overlap region, the 
so-called crossbridges. These crossbridges 
are formed by the globular regions of the 
myosin molecule and are responsible for 
force generation in the contracrile process 
through the hydrolysis of adenosine triphos- 
phate (ATP). 

Myosin, which has a molecular size of 
about 520 kilodaltons, consists of two 220- 
kD heavy chains and two pairs of light 
chains that vary in molecular size depend- 
ing on the source but are usually between 
15 and 22 kD (3, 4). The molecule is highly 
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asymmetric, consisting of two globular 
heads attached to a long tail. Each heavy 
chain forms the bulk of one head and 
intertwines with its neighbor to form the 
tail. Limited proteolytic digestion has 
shown that the myosin head, or subfrag- 
ment-1 (SI), contains an ATP, actin, and 
two light chain binding sites and that the 
myosin rod, which is formed by a coiled coil 
of two a helices, accounts for the self- 
association of myosin at low ionic strength 
and the formation of the thick filament 
backbone (3). Spudich and co-workers 
have demonstrated that the globular head 
portions of myosin are sufficient to generate 
movement of actin in an in vitro motility 
assay (5). 

Each globular head, derived fi*om limit- 
ed proteolysis, consists of a heavy chain 
fi*agment having a molecular size of 95 kD 
and two light chains yielding a combined 
molecular size of -^130 kD (6). The two 
light chains differ in their structure and 
properties and are known by a variety of 
names. In this article they are referred to as 
the regulatory and essential light chains. 
Neither type is required for the adenosine 
triphosphatase (ATPase) activity of the 
head (7). In some species, however, these 
chains regulate or modulate the ATPase 
activity of myosin in the presence of actin 
(8, 9)-. Amino acid sequence analyses re- 
veal that both light chains share consider- 
able sequence similarity with calmodulin 
and troponin C although most of the diva- 
lent cation binding sites have been" lost 
during evolution (JO). 

During the last 40 years, enormous effort 
has been expended toward understanding 
the structure and function of the myosin 
head (11). Measurements from electron 
micrographs have suggested that the myosin 
head is pear-shaped, about 190 A long and 
50 A wide at its thickest point (12). Mo- 
lecular dimensions subsequently derived 
from studies -of fixed thin sections cut fi-om 
crystals of myosin SI were consistent with 
these observatioris (13). 

Although much biochemical and physi- 
cal information has accumulated for myosin 
since 1950, structural knowledge of this 
protein or any other molecular motor has 
been lacking. We now describe the tertiary 
structure of the myosin head and suggest 
how this protein may serve to transduce 
energy from the hydrolysis of ATP into 
directed movement. We present the three- 
dimensional structure of myosin SI at a 
nominal resolution of 2.8 A and refinement 
R factor of 22.3 percent for all x-ray data 
recorded in that range. 

Crystallization of myosin subfragment- 
L Myosin is an abundant protein that can be 
easily prepared in gram quantities. Likewise, 
the myosin head, which is readily cleaved 
from the rest of the molecule by mild prote- 



olysis, can be prepared in large quantities. 
This soluble subfragment has been known 
for approximately 30 years and has resisted 
crystallization despite numerous attempts. In 
view of its central importance for under- 
standing the molecular basis of muscle con- 
traction, we undertook an alternative ap- 
proach to the usual ways of obtaining x-ray 
quality crystals. The protein was first sub- 
jected to mild chemical modification of the 
lysine residues by reductive methylation. 
TTiis chemical modification has long been 
used as a gende way to introduce a radioac- 
tive label into a protein (14). 

Considerable effort was expended to 
determine the optimal procedure for mod- 
ifying the proteiri since it was recognized 
that complete, homogeneous modification 
of the molecule was essential for obtaining 
high-quality crystals (Table 1). Many of 
the experiments necessary to derive the 
optimal protocol for methylation were per- 
formed in a parallel study on hen egg 
white lysozyme (15). In that study the 
three-dimensional structure of the modi- 
fied protein was determined and refined to 
1.8 A resolution and shown to be essen- 
tially identical to that of the native pro- 
tein except for the modified lysine resi- 
dues. Modification of the lysine residues in 

Table 1 , Amino acid analysis of modified and 
native myosin S1 (60). Prior to modification, the 
protein, at 5 mg/ml, was dialyzed against 200 
mM potassium phosphate. pH 7.5, 1 mM 
MgCla. The protein was reductively methylated 
at 4°C by the sequential addition of 1 M dimeth- 
ylamine borane complex dissolved in water (20 
\l\ per milliliter of protein) and 1 M formalde- 
hyde (40 M,l per milliliter of protein) with rapid 
stirring. This process was repeiated after 2 
hours; a further portion (10 jil/ml) of dimeth- 
ylamine borane complex was added after 2 
hours and the reaction mixture was kept over- 
night at 4°C in the dark. The reaction was 
quenched by the addition of 3.8 M ammonium 
sulfate to a final concentration of 1 M and then 
dialyzed for 48 hours against 2.5 M ammonium 
sulfate, 50 mM potassium phosphate at pH 6.7 
to precipitate the protein (75, 49). All except 
three to four of the lysine residues were modi- 
fied. Discrepancy between the total number of 
lysine residues in the native and modified pro- 
tein may have arisen from a calibration error in 
the dimethyllysine standard. The analyses for 
histidine, methionine, and arginine are shown 
as controls. 



Amino Residues (no.) 



acid 


Theoretical 


Native 


Modified 


Lysine 


103 


96.2 


4.2 


Me^-Lys 


0 


0 


0 


Mej-Lys 


d 


0.6 


96.7 


Meg-Lys 


3 


3.6 


3-4 


Total lysine 


106 


100.4 


104.3 


Histidine 


.24 


23.7 


23.2 


Methionine 


39 


39.4 


38.9 


Arginine 


46 


47.5 


47.2 



lysozyme dramatically changed its crystal- 
lization properties. The kinetic and struc- 
tural effects of this treatment on myosin 
SI are discussed below. 

Myosin isolated from chicken pectoralis 
muscle consists of a mixed population of 
two isozymes caused by the existence of two 
species of the essential light chain (16). 
These light chains are referred to as A 1 (21 
kD) and A2 (16 kD). Amino acid sequence 
studies of the light chains have demonstrat- 
ed that Al and A2 are identical over their 
142 residues at the COOH- terminus. The 
size difference is caused by an additional 41 
amino acids present at the NH2-terminus of 
AL These isozymes arise by alternative 
transcriptioti and two modes of splicing 
from a single gene (J 7). 

The crystals used in our study contained 
both isoforms of the essential light chain. 
Myosin SI was prepared by digestion with 
papain in the presence of MgCl2 because the 
fragment produced under these conditions 
contained both the regulatory and essential 
light chains. The major drawback of papain 
as a proteolytic enzyme, however, was its 
lack of specificity. Apart from cleaving the 
heavy chain at the head-rod junction, addi- 
tional proteolytic breaks were introduced 
into both the regulatory and Al essential 
light chains. Also, there was partial phos- 
phorylation of the regulatory light chain by 
endogenous myosin light chain kinase. The 
myosin SI was prepared by an improved 
purification protocol that removed the het- 
erogeneity arising from both proteolysis of 
the light chains and phosphorylation of the 
regulatory light chain (18). 

Crystals were grown by batch methods 
from 1.35 M ammonium sulfate, 500 mM 
potassium chloride, and 50 mM potassium 
phosphate (pH 6.7) in the presence of 5 
mM dithiothreitol and 0.5 mM sodium 
azide at a final protein concentration of 8 
to 12 mg/ml. Crystallization was initiated 
by microseeding, and the crystals grew as 
thick rods to a length of 1 to 2 mm and a 
width and thickness of 0.4 and 0.3 mm, 
respectively, over a period of 2 to 3 
months at 4°C. They belonged to the 
space group C222i with unit cell dimen- 
sions of a = 98.4, b = 124.2, c = 274.9 
A, and one molecule in the asymmetric 
unit. These crystals were different from 
those originally reported (19) and arose 
from improvements in both the chemical 
modification procedure and the protein 
homogeneity. 

Structure determination. The x-ray 
data were collected in two stages (20). 
First, x-ray data sets to 4.5 A resolution for 
the native and heavy atom-containing 
crystals were recorded by an area detector 
with the goal of determining the positions 
of the metal binding sites. These data were 
then extended to 2.8 A resolution with 
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synchrotron radiation at Stanford Utiiver- 
sity (SSRL) and Cornell University 
(CHESS). We recognized early that x-ray 
data collection and determination of the 
protein phases by multiple isomorphous re- 
placement would be difficult unless care was 
taken to minimize the systematic errors 
introduced by differences between the suc- 
cessive protein preparations. Consequently, 
for each stage in the heavy atom derivative 
data collection, a corresponding native data 
set was recorded from the same protein 
preparation. For each purification trial, ap- 
proximately 700 mg of myosin SI was pre- 
pared and set up for crystallization. Many 
attempts were made before a single prepa- 
ration yielded sufficient crystals for x-ray 
data collection. 

The structure was determined by a com- 
bination of multiple isomorphous replace- 
ment and solvent flattening. The first de- 
rivative solved was obtained from crystals 
soaked in trimethyllead acetate and proved 
to be highly isomorphous with only four 
binding sites (2 J ) . It was used to determine 
the positions of the other heavy atom bind- 
ing sites by difference Fourier techniques 
(Table 2). The positions and occupancies 
of the heavy atom sites were refined accord- 
ing to the origin-removed Patterson-func- 
tion correlarion method by the program 
HEAVY (22). The overall figures of merit 
for the area detector, CHESS, and SSRL 
synchrotron data were 0.47, 0.58, and 
0.42, respectively. 

The higher resolution x-ray data collect- 
ed at SSRL were placed on the same scale as 
the area detector data and included as a 
block from 4.5 to 2.8 A. Efforts to merge 
the overlapping data between the area de- 
tector and synchrotron data were unsatis- 
factory. However, the phase information 
from all three sources was combined 
throughout the entire resolution range via 
the phase probability coefficients (23) . 




9 



Fig. 1, Bamachandran plot of the main chain 
dihedral angles of all non-glycinyl residues in 
the model presented. 



These protein phases were improved by 
solvent flattening (24). The positions and 
occupancies of heavy atom binding sites 
were further refined against these modified 
phases (25). This gave an improved elec- 
tron derisity map into which approximately 
550 alanine residues were built with the 
program FRODO (26). The map showed 
good connectivity and many well-defined- 
side chains. 

Once several long segments were con- 
nected, the positions of these alanine resi- 
dues were matched to the known amino 
acid sequence (27, 28). At this stage phase 
information from the partial model was 
combined with the heavy atom derivative 
phases by the program SIGMAA (29). The 
structure was refined concurrently with the 
model building process by the program 
package TNT (30) . Once the model build- 
ing was near completion, a cycle of refine- 
ment with X-PLOR (31) was performed to 
improve the conformations of the side 
chains. The strategy of alternate model 



building and refining proved successful and 
constantly improved the estimation of the 
protein phases. Toward the end of the 
analysis there were clear segments of elec- 
tron density corresponding to portions of 
the light chains that were completely miss- 
ing in the original maps phased with heavy 
atom derivatives alone. 

At present, 1072 residues (of a total of 
1157) have been built into the electron 
density map. The model was refined to an R 
factor of 22.3 percent for all measured x-ray 
data between 30 to 2.8 A with root-mean- 
square deviations from ideal geometry of 
0.018 A for bond lengths, 2.5"* for bond 
angles, and 0.013 A for groups of atoms 
expected to be coplanar. No solvent mole- 
cules have yet been built into the electron 
density (Figs. 1 and 2). . 

Structure description. In a space-filling 
representation of all atoms in the myosin SI 
model (Fig. -3), the green, red, and blue 
segments represent parts of the heavy chain 
and the yellow and magenta stretches cor- 



Rg, 2. A stereo view of a representative section of electron density located in the seven-stranded 
p sheet motif of the heavy chain calculated with SIGMAA coefficients {29), The phases and weights 
used to calculate the electron density were obtained by combining the information from the heavy 
atom phases and those derived from the atomic model. 



Table 2. Heavy atom derivatives used in the structure determination and their data collection 
statistics. 



Conditions* 

Reflec- Resolu- 



Derivative 


Concen- 
tration 
(mM) 


Time 
(days) 


Method 




tions 
(no.) 


tion 
(A) 


^scale^ 


(no.) 


pow- 
er§ 


Trimethyllead 


20 


21 


Area detector 


6.7 


13,394 


3.5 


24.5 


4 


1.01 


acetate 




















KAu(CN)2 


1 


5 


Area detector 


5.8 


18.082 


3.5 


18.2 


6 


1.14 


KaOsOVpyridine 


2-20 


2 


Area detector 


6.7 


11.804 


4.0 


24.5 


4 


1.02 


K3UO2F5 


3 


4 


Area detector 


6.7 


11.688 


4.0 


19.1 


7 


1.06 


Trimethyllead 


20 


21 


CHESS 


11.5 


34.498 


2.8 


28.2 


4 


1.19 


acetate 




















KAu(CN)2 


1 


5 


CHESS 


9.7 


36.339 


2,8 


17.0 


8 


1.23 


KaOsO^/pyridine 


2-20 


2 


CHESS 


10.0 


32.554 


2.8 


32.2 


8 


0.99 


Cis-R(NH3)2Cl2 


2 


3 


CHESS 


10.3 


36,108 


2.8 


21.0 


12 


1.12 


K3U02F5 


3 


4 


CHESS 


10.9 


36,419 


2.8 


22.7 


9 


0.98 


Trimethyllead 


15 


21 


SSRL 


13.0 


31.667 


2.8 


27.1 


4 


1.11 


acetate 




















K3UO2F5 


2 


3 


SSRL 


11.8 


33,043 


2.8 


22.2 


6 


0.88 



*The heavy atom derivatives were prepared at 4''C by first slowly transferring the crystals to a synthetic mother 
liquor composed of 1.5 M ammonium sulfate, 500 mM KCI buffered with 20 mM Pipes at pH 6.7. tflsym = 
22:[l/hil - l/h'l/Shi^i X 100. where /^j and ^ are the intensities of the individual and mean structure 
factors. tflscaie = 5;(iFhl - i/^ii]/2n/>i X 100, where and F„ are the heavy atom and native stnjcture 
factors. §The phasing power is defined as the mean value of the heavy atom structure factor divided by the 
residual lack-of-closure error. 
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respond to the essential and regulatory light 
chains, respectively. As can be seen, the 
myosin head is highly asymmetric with a 
length of 165 A, a width of 65 A, and a 
thickness of approxiitiately 40 A. 

Previous knowledge of the organization 
of the heavy chain in the myosin head was 
derived from proteolytic studies. Limited 
tryptic digestion of vertebrate skeletal SI 
indicated that the head contained three 
major regions: a.25-kD NH2-tenninal nu- 
cleotide binding region (32) , a central 50- 
kD segrrient, and a 20-kD COOH-terminal 
segment; the last two were shown to bind to 



actin (33, 34)- These proteolytic segments 
are displayed in green, red, and blue, re- 
spectively (Fig. 3); the light chains abut 
one another and are wrapped around a 
single a helix of the heavy chain but do not 
overlap to any significant extent. 

The secondary structure of the myosin 
head is dominated by a helices with ap- 
proximately 48 percent of the amino acid 
residues in this conformation (Figs. 4 and 
5). One key structural feature is the long 
(approximately 85 A) a helix which ex- 
tends from the thick part of the head down 
to the COOH-terminus of the heavy chain. 



This a helix constitutes the light chain 
binding region of the heavy chain. There is 
a bend, delineated by amino acid residues 
TrpS29^ p^o830^ Yrp8^\ and Met«^^ which 
connects this long a helix to a short 
COOH-terminal a helix of the 95-kD 
heavy chain fragment. A brief description 
of the three polypeptide chains constituting 
the myosin head is given below. 

The regulatory light chain is located at 
the end of the molecule distal from the 
nucleotide binding site (Figs. 4 and 5). It 
consists of two domains and shares consid- 
erable structural homology with calmodulin 
and troponin C except that the long con- 
necting helix observed in calmodulin and 
troponin C is distorted (35, 36). A com- 
parison of the regulatory light chain with 
calmodulin is shown in Fig. 6A where the 
eight helices that comprise the two domains 
have been labeled A through H. The reg- 
ulatory light chain is arranged such that its 
NH2-terminal domain wraps around the 
COOH-terminus of the heavy chain be- 
tween amino acid residues Asn®^^ and 
Leu^"*^ whereas its COOH-terminal domain 
interacts with the heavy chain in the region 
defined by amino acid residues Glu®°^ to 
Val^^^. The interacrion of the NH2-termi- 
nal domain with the heavy chain is stabi- 
lized by a cluster of hydrophobic residues 
including nine phenylalanines, two tryp- 



Fig. 3. A space-filling 
representation of all of 
the atoms in the current 

1 1 luuci ui iiiyOoii I o 1 . 1 1 ic 

model is oriented such 
that the actin binding 
surface is located at the 
lower right-hand comer. 
The 25-, 50-rand 20-kD 
segments of the heavy 
chain are colored in 
green, red, and blue, re- 
spectively, whereas the 
essential and regulatory 
light chains are shown in 

yellow and magenta, respectively- In this orientation the prominent horizontal cleft that divides the central 
50-kD segment of the heavy chain into two domains (upper and lower defined by this orientation) is 
clearly visible. This figure was prepared with the molecular graphics program MIDAS {61). 




Fig. 4. A ribbon representation of the entire 




model for myosin S1 . In this and all successive 
figures, 2000 and 3000 have been added to the 
residue numbers of the regulatory and essential 
light chains, respectively, to distinguish these 
from the heavy chain. Heavy chain residues 
Asp* to Glu204, Gly2i6 to Tyr^e^ an^i Gln®*^ to 
l_y5843 are colored in green, red, and blue, 
respectively. These segments are separated by 
disordered loops for which no density is evident 
in the current map. There are two additional 
segments in the heavy chain for which the 
density is weak or disordered. These include 
residues Lys^^^ Lys^^"^ and lie^^^ Phe^^/ 
The A2 isozyme of the essential light chain, 
shown in yellow, theoretically contains 149 ami- 
no acid residues. In the model it extends from 
residue Asp^ to VaP"*® and contains one ill- 
defined region that includes residues Leu^° to 
Ala^. The regulatory light chain, which is col- 
ored in magenta, theoretically consists of 166 
amino acid residues. In the current model it 
extends from residue Phe^^ to Lys^®^ but is 
disordered between residues Pro'"^^ and 
Asn""*^. In this figure the molecule is oriented 
perpendicular to its long axis and rotated to 
view along the active site pocket. A sulfate ion, 
shown here in a space-filling representation, is 
located at the base of the pocket. The actin 
binding surface has been defined as indicated 
on the figure by the location of the 50- to 20-kD 
junction (residues Tyr^^® and Gln^^) and by its 
interaction with actin {46). Figures 4 to 7 were 



prepared with the molecular graphics program 
MOLSCRIPT {62). 
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tophans, and four methionines. Five of 
these residues are contributed by the heavy 
chain. Superposition of the NH2 -terminal 




domains of the regulatory light chain and 
calmodulin reveals an rms difference in the 
positions of 59 equivalent residues of only 




1.3 A. By contrast the COOH- terminal 
domain is less similar to the structure ob- 
served in calmodulin and is due to a differ- 
ence in the positions of the F and G helices 
in the regulatory light chain that have 
moved to accommodate the heavy chain. 
In addition, the COOH-terminal domain 
as a whole has rotated, relative to calmod- 
ulin, about the midpoint between the two 
domains in order to form a tight complex 
with the heavy chain. 

The divalent cation binding site is locat- 
ed in the first helix-loop-helix motif ob- 
served in the amino acid sequence and, as 
indicated above, has a conformation similar 
to that observed in calmodulin. A divalent 
cation is clearly evident in the electron 
density and is most likely Mg^"*" in that this 
was a minor constituent of the crystalliza- 
tion buffer. In our model, no electron den- 
sity was observed for the first 18 amino acid 
residues in the regulatory light chain. This 
includes Ser^^ and is, by sequence homolo- 
gy to rabbit myosin, the site of phospho- 
rylation by myosin light chain kinase (37). 
Presumably this portion of the polypeptide 
chain is flexible in SI and perhaps only 
plays a functional role when the head is 
attached to the remainder of the molecule. 
The observed NHj- and COOH-terminal 
residues of the regulatory light chain lie 
close to the interface between the two 
domains. 

The essential light chain interacts with 
the long a helix of the heavy chain through 
amino acid residues Leu^^^ to Met^°^ (Fig. 
6C). Likewise, it wraps around the heavy 
chain a helix but in a manner different- 
from that observed for the regulatory light 
chain. Its arrangement resembles that for 
the interaction of calmodulin with a target 
peptide firom myosin light chain kinase 
(38) . It differs in that the second and third 
helices in the NH2- terminal domain abut 
the heavy chain with their external surfac- 
es, whereas the corresponding secondary 
structural elements in calmodulin enclose 
the respective target peptide. The electron 
density for this part of the molecule is the 
least well ordered of the entire map. In- 
deed, very little of the essential light chain 
was visible in the original electron density 
map and only appeared after the phase 
information firom the rest of the molecule 
was included. This could be due to either 
lack of isomorphism in the heavy atom 
derivative phases or conformational flexi- 
bility of the polypeptide chain. It is difficult 
to distinguish between these two possibili- 
ties because the crystals contain both class- 
es of essential light chain isoforms. As with 
the regulatory light chain, the NHj- and 
COOH-terminal residues lie close to the 
interface between the two domains. 

The heavy chain constitutes the entire 
thick portion of the myosin head and con- 



Fig, 5, A stereo a carbon plot of the entire myosin head in which the view has been rotated 90° with 
respect to Fig.' 4. In this view, the active site pocket is seen as a wide depression. Selected residues have 
been labeled to allow the path of the chain to be followed and to identify the start and end of the 
secondary structural elements. 



Regulatory Light Chain 




Calmodulin 



3005 



3061 

Essential Light Chain 



Fig. 6. (A) and (C) show ribbon representations of the regulatory and essential light chains together 
with the segment of the heavy chain with which they interact. The light chains are oriented such that 
the NHg-terminal domains have the same orientation as calmodulin shown in (B). The coordinates for 
calmodulin were taken from the Brookhaven Protein Data Bank (file 3CLN) from the structure 
determined by Cook and co-workers {63). 
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Fig. 7. A stereo ribbon and a carbon plots of the 
catalytic portion of the myosin head centered on the 
active site. In (A) and (B) the actin binding face, as 
defined by the position of the 50- to 20-kD junction, is 
located on the far side of the molecule. In (C) the 
molecule has been rotated 90** about the horizontal 
axis to reveal more clearly the relation between the 
active site pocket and the reactive cysteine residues. 
(A) A larger segment of the myosin head that reveals 
the overall disposition of the secondary structural 
elements around the nucleotide binding site. The 
upper domain of the 50-kD segment is shaded in gray 
whereas the lower domain is shaded in black to 
emphasize the narrow cleft that divides them. (B) A 
more detailed view of the residues that form the 
interface between the upper and lower domains of the 
50-kD segment. Marker residues are identified that 
allow all other residues to be located. In addition, a 
few of the side chains for the residues that have been 
implicated to be important in the catalytic mechanism, 
from amino acid sequence analyses and from chem- 
ical studies, have been included. Residues Trp''^^ and 
Ser^24 ti^g^ ^3^3 [^ggP, identified from photolabeling 
studies lie on opposite sides of the nucleotide binding 
pocket. (C) The helix connecting the reactive cys- 
teines. Cys^°^ and Cys®^^, lies at the base of a cleft at 
the junction between the lower domain of the 50-kD 
segment and the NH2-terminal 25-kD segment. 



tains both the nucleotide binding site and 
actin binding region. These are located on 
opposite sides of the protein. This part of 
the molecule contains a complex arrange- 
ment of secondary structural elements cen- 
tered mainly around a large, mostly paral- 
lel, seven-stranded P sheet motif. The to- 
pology of this P sheet is such that strands 
one and six run in the opposite direction to 
the other five strands. The central strand 
corresponds to the strand-loop-helix bind- 
ing motif, which has the sequence GES- 
GAGKT (39), observed both in adenylate 
kinase and the Ras protein (40). The to- 
pology and organization of the heavy chain 
are described below in terms of the three 
major tryptic fragments. However, these 
fragments arise from proteolytic cleavage at 
flexible loops and do not represent discrete 
structural domains. 

The first observed residue at the .NH2- 
tenninus of the heavy chain is Asp"* and is 
located close to the essential light chain at the 
approximate center of the entire myosin mol- 
ecule (Figs. 4 and 5). From here the heavy 
chain crosses the width of the molecule and 
forms a small six-stranded antiparallel P sheet 
motif (Lys^^ to Met®°), which is fairly inde- 
pendent of the rest of the head and protrudes 
from the molecule as a whole. The function of 
this domain is unknown although it does not 
appear essential for motility in that it is 
missing in several single-headed myosin 
I-type molecules (41). The topology of this 
sheet is similar to that of the Src-homology 3 
domain observed in spectrin (42). After this 
motif, the heavy chain forms three strands of 
the large P sheet motif that are connected by 
a series of a helices. The first two strands 




extend 'from Tyr^^^ to Tyr^^® and fi-om Cys^" 
to VaP^^ and are connected by a P turn. 
Thereafter there are three short helices prior 
to the fourth P strand in the sheet that 
extends from Gln^^^ to Gly^'^. The third 
strand belongs to the COOH-tenninal 20-kD 
fragment of the heavy chain fragment. The 
fourth or central strand precedes the phos- 
phate binding loop and is followed by a helix. 



Lys*^^ to Ile^^, which forms the base of the 
nucleotide binding pocket. The topology of 
this loop is essentially identical to that in the 
Ras protein and adenylate kinase (40). A 
sulfate ion is embedded in the phosphate 
binding loop and is located close to the 
position of the P phosphate observed in 
the complex between Ap^A [P^P^,bis- 
(adenosine-5'-) pentaphosphate] and adenyl- 
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ate kinase (Figs, 4 and 5). It is perhaps not 
surprising to find a sulfate ion in the nucleo- 
tide binding site because ammonium sulfate is 
a competitive inhibitor of the ATPase activity 
(43). A break in the electron density is 
observed between Glu^°^ and Gly^^^ at the far 
end of the active site pocket. The missing 
segment, which contains six charged residues, 
occurs at the, 25- to 50-kD junction and is 
most likely a constitutively flexible loop. 

The 50-kD fragment has a complex to- 
pology that can be described as two major 
domains separated by a long narrow cleft as 
is evident in the space-filling drawing (Fig. 
3). This cleft divides the distal one-third of 
the myosin head into two regions, which 
are referred to as the upper and lower 
domains of the 50-kD segment (Fig. 3) . 

Electron' density for the polypeptide 
chain resumes at Gly^^^ as the start of an ot 
helix (Leu^i® to Gly^^^). This helix fonns 
part of the nucleotide binding pocket. 
Thereafter, the chain loops around close to 
the phosphate binding site and connects up 
to p strands six and seven of the large p 
sheet motif that extend from Gly^"*^ to 
His^^"^ and Leu^^^ to Tyr^^^ respectively. 
Strand seven terminates in a domain com- 
posed of random coil and several short hel- 
ices and extending from Glu^^^ to 
This region is located close to the nucleotide 
binding site and contains Ser'^'* which had 
been previously identified by photolabeling 
to be an active site residue (44) (Fig. 7) - An 
a helix extending from Asp^" to Ile-^'*^ 
forms the top of the nucleotide binding 
pocket. After this domain, the polypeptide 
chain forms the end of the myosin head 
through a series of Ions a helices. The 
longest of these is 45 A in length and 
extends from Val"*^^ to Leu"^^. Strand five of 
the large mixed p sheet follows this helix 
and extends from Tyr"^^^ to Ala'*^^. This 
strand terminates in a random coil that drops 
from the "upper" to "lower" domains of the 
50-kD fragment. The midpoint between the 
upper and lower domains is located close to 
G and occurs in a region of the sequence 
(Tyr'^^^ to Gly^^^) that is highly conserved in 
all myosins (45). Furthermore, the cleft 
itself contains many individual highly con- 
served residues that extend into the space 
between the two domains. 

The lower domain is built from several 
long a helices (Phe^^^ to Lys 505 and Met^^^ 
to Glu^^^), the last of which contains a 
hydrophobic bulge at Pro^^^. After another 
helix (Asp^"^^ to His^^®) there is a three- 
stranded antiparallel P sheet, which includes 
residues Asn^^ to Lys^^^ Phe^?^ to Va\^^\ 
and Thr^®^ to Tyr^^°. The electron density 
for Lys^", Gly^^-', and Lys^^'* is very weak, 
and therefore these residues have been ex- 
cluded from the model. The segment be- 
tween Pro^^^ and Lys^^^ is one component of 
the actin binding surface as defined by Ray- 
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ment et al (46). A single segment of random 
coil (Lys^ to Leu^^) passes from the lower 
domain and across the cleft to form a helix- 
loop-helix motif on the outer face of the 
upper 50-kD domain and terminating at 
There is no electron density corre- 
sponding to amino acid residues Gly^^^ to 
Phe^^. This particular stretch contains the 
second major site of trypsin proteolysis and is 
the junction between the 50- and 20-kD 
fragments. The primary sequence in this 
disordered region contains nine glycine and 
five lysine residues, suggesting that it may be 
a flexible region in the molecule. This site is 
resistant to proteolysis in the actomyosin 
complex and as such may contribute to the 
actin binding interface of myosin (33). In 
addition, this region has also been implicat- 
ed in actin binding from crosslinking and 
kinetic studies of proteolytically cleaved pro- 
tein (34, 47). 

Electron density for' the polypeptide 
chain resumes at Gln^^ and proceeds as a 
. long a helix (Ser^^° to Arg^^^) across the flat 
face of the molecule toward the light chain 
binding region and lies between the upper 
and lower domains of the 50-kD fragment. 
This helix is part of a highly conserved 
segment that runs from Leu^^^ to Asn^^®. At 
the end of the helix, the polypeptide chain 
turns into the center of the molecule and 
forms the third strand of the mixed P sheet 
(His^ to Ile^'^). Thus the major tertiary 
motif of the head contains contributions 
frorn all three of the tryptic fragments. After 
leaving the P sheet, the polypeptide chain 
proceeds through the large surface loop de- 
fined by Thr^^^ to Glu^^, which caps one 
end of the nucleotide binding site pocket. 
Subsequently, the polypeptide chain forms 
two a helices lying under the nucleotide 
binding site and delineated by His^® to 
Asn^^s and Val™ to Arg^^. This highly 
conserved segment in the sequence contains 
the two sulfhydryl groups, Cys^°^ and 
Cys^^^, which are more reactive than the 
other 11 in the molecule and have been 
given the names SHI and SH2, respective- 
ly, in the order of their chemical reactivity. 
These two thiols can be crosslinked by oxi- 
dation and a wide variety of bifunctional 
chemical reagents differing in length firom 14 
to 3 A but only in the presence of nucleotide 
(48). Indeed, formation of a covalent link 
between these two groups serves to trap 
Mg^"^-ADP (adenosine diphosphate) in the 
active site. Although these reactive sulfhy- 
dryls have been thought to reside in a flex- 
ible loop, the discovery that these two. resi- 
dues are separated by an a helix was surpris- 
ing (Fig. 7C). This is a well-defined region 
of the electron density map. The fact that 
the a carbons of Cys^^^ and Cys^°^ are 
approximately 18 A apart suggests that a 
rearrangement or conformational change in 
this area must occur upon nucleotide bind- 
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ing. This point is further emphasized by the 
observation that SHI and SH2 both lie in 
small clefts that face out toward the solvent 
on opposite sides of the molecule. The fimc- 
tional significance of this region is also indi- 
cated by the very high degree of amino acid 
sequence conservation in this area of the 
rholecule. 

The segment that follows the reactive 
sulfhydryl groups consists of a small three- 
stranded antiparallel P sheet that includes 
residues Arg^^^ to Tyx'^'\ Tyr"8 to Gly^^S 
and Lys^^ to Phe^^', and is associated with 
two short helices. This domain is separated 
from the adjacent NHj-terminal domain of 
the 25-kD fragment of the heavy chain by a 
distinct cleft and shows a greater associa- . 
tion with the COOH-terminal domain of 
the essential light chain. Thereafter the 
heavy chain continues as a long a helix 
that shows distinct curvature beginning at 
Leu^^^ and ending at Val^^^. There is a 
decided bend in the course of the polypep- 
tide chain resulting from the Trp®^^, Pro®^°, 
Trp®^^ sequence (Figs. 4 and 5). The heavy 
chain terminates at residue Lys^"^^^ afteir a 
small a helix that lies nearly at right angles 
to the preceding long helix. 

Effect of the reductive methylation on the 
protein structure and function. One question 
that must be addressed is the effect, if any, of 
reductive methylation on the conformation of 
the protein. An examination of the kinetic 
properties of modified myosin SI reveals that 
the protein is enzymatically active (49). 
There are changes in the kinetic parameters 
that are similar to those observed when only 
the reactive sulfhydryl groups are alkylated 

(50) . The results do not suggest any major 
changes in the overall conformation of the 
molecule since these would be expected to 
abolish its enzymatic activity. Myosin fk)m 
most sources already contains several post- 
translationally modified amino acid residues. 
For example, in chicken skeletal myosin SI, 
Lys^^ is monomethylated, Lys^''^ and Lys^^^ 
are trimethylated, and His^^^ contains a 3-N- 
methylated side chain (27). Although the 
role of these modified residues is unknown, if 
has been suggested that methylation of Lys ^-^^ 
provides a permanent positive charge that 
may become buried when nucleotide is bound 

(51 ) . In our structure, Lys^-'^ is exposed to the 
solvent at the edge of the nucleotide binding 
pocket. However, this region of the protein 
probably rearranges when nucleotide binds 
because the adjacent Trp^-^^ is photolabeled 
by two purine ATP analogues (52). 

The structure of methylated lysozyme is 
essentially identical to that of the native 
protein (i5). From this it is not expected 
that the folding motifs in myosin SI will be 
significantly affected by this treatment. 
There is, however, the possibility that the 
relation between the various domains could 
be altered. Our data reveal that almost all of 
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the lysine residues are located at the surface 
and hence would not be expected to influ- 
ence the structure in any major way. The A2 
isozyme of chicken myosin SI contains 102 
lysine residues (27), of which 85 have been 
built into the model. Of those, 67 are 
located at the surface of the protein and only 
18 participate in salt bridges and can be 
considered buried. Five of these lysines par- 
ticipate in crystalline contacts. The remain- 
ing 1 7 lysine residues in the A2 isozyme are 
located in disordered loops; in our structure 
all except 4 lysine residues are reproducibly 
modified under conditions where 100 per- 
cent dimethylation is expected (Table 1) 
(15). Thus, the unmodified lysine residues 
are most likely located in salt bridges where 
they would be expected to have a higher 
pK^, Mass for the additional methyl groups 
on the lysine residues is evident in the 
electron density map for most of the well- 
ordered side chains. However at this resolu- 
tion it difficult to categorically decide if a 
residue has been modified based on the 
density alone. Even so, it appears that 
Lys*®^, which resides in the phosphate bind- 
ing loop, is not modified. 

Active site and possible mechanism for 
muscle contraction. The catalytic site of the 
myosin head was identified by analogy to the 
phosphate binding loop in both the Ras pro- 
tein and adenylate kinase and by the position 
of the amino acid residues previously identi- 
fied by chemical studies with ATP analogues 
(52). The nucleotide binding pocket is locat- 
ed on the opposite side of the head from the 
proposed actin binding site and is in an open 
conformation (Figs. 4, 5, and 7). The view in 
Fig. 7 shows the position of the sulfate ion in 
the phosphate binding loop and a few of the 
amino acid residues that have been chemical- 
ly labeled, including Tvp'^\ Sex'^\ Sct''^\ 
and Ser'^^ (44, 52, 53). The width of the 
nucleotide binding pocket at its surface is 
approxiinately 15 A as measured between a 
carbons. Since the binding constant of myo- 
sin for Mg^"^- ATP is about 3 X 10^^ (54) and 
residues on both sides of the cleft have been 
photochemically labeled, it is likely that the 
pocket closes when nucleotides bind in the 
active site. The pocket is approximately 13 A 
wide and 13 A deep with an angle between 
the faces of the pocket of —40°. The base of 
die cleft is located 90 A from the COOH- 
terminus of the myosin head. If the binding 
face to actin remains essentially stationary, 
closure of the nucleotide binding cleft could 
produce a movement at the COOH-terminus 
of the myosin head of approximately 60 A. 
How this rearrangement is actually accom- 
plished cannot be easily predicted fi-om our 
structure. 

The orientation of the molecule in Fig. 
5 is rotated such that the actin binding 
surface is approximately perpendicular to 
the page (46). Closure of the nucleotide 



binding pocket would rotate the COOH- 
terminal end of the heavy chain that carries 
the light chains toward the viewer, which is 
consistent with that expected for the start 
of the power stroke. From this perspective it 
appears that a major fiinction of the light 
chains is to create a longer molecule and 
hence amplify the conformational changes 
associated with the active site. 

Muscle contraction consists of the cyclic 
attachment and detachment of the myosin 
head to the actin filament with the con- 
comitant hydrolysis of ATP. From the ex- 
tensive kinetic studies on the interaction of 
myosin with actin (55) , a general picture of 
the sequence of kinetic events occurring 
during muscle contraction has emerged. 

Transient kinetic measurements originally 
demonstrated that transduction of the chem- 
ical energy released by the hydrolysis of ATP 
into directed mechanical force occunred dur- 
ing product release rather than during the 
hydrolysis step itself (56) . The cycle of events 
was summarized as follows: Mg^"^-ATP rapid- 
ly dissociates the actomyosin complex by 
binding to the ATPase sites of myosin; firee 
myosin then hydrolyzes ATP and forms a 
relatively stable myosin-products complex; ac- 
' tin recombines with this complex and disso- 
ciates the products, thereby forming the orig- 
inal actin-myosin complex. Presumably, force 
is generated during the last step. Although 
this model provided an important conceptual 
firamework for studies of the contractile cycle, 
it soon became clear that the interactions 
between myosin, actin, and the substrate and 
products were more complex (55) . 

Stmctural information on the conforma- 
tional changes that occur during the actomy- 
osin interactions is limited. Addition of ATP 
causes no significant change in the amount of 
secondary structure as assessed by circular 
dichroism (57). The changes observed in 
tryptophan fluorescence are typical of most 
enzymes whose active sites are induced to fit 
around their substrates. However, significant 
movement within the myosin head must oc- 
cur during the ATPase activity because of the 
large change in distance between the two 
reactive cysteine residues (Cys^^^ and Cys^^^) 
that is induced when nucleotide binds (48, 
58) . Recent low-angle x-=ray scattering studies 
also suggest a large-scale movement during 
ATP hydrolysis (59). 

In formulating a model for muscle contrac- 
tion from the structure of myosin S 1 presented 
here, it must be understood that it neither 
contains nucleotide nor is bound to actin. 
Most likely the crystal structure is an interme- 
'diate between these two extremes, although 
probably closer to the actin bound state. 
Preliminary attempts to dock myosin (46) to 
actin suggest that a better fit to the image 
reconstructions of SI -decorated actin would 
be obtained if the long narrow cleft between 
the upper and lower 50-kD domains were to 



close, thus implying that this is an important 
structural feature of the molecule. In addition, 
the preliminary fit implies that the actin 
binding site contains components from both 
the upper and lower 50-kD domains and the 
first a helix from the 20-kD region. From the 
location of residues Tyr^^^ and Gln^^, the 
positively charged disordered segment at the 
50- to 20-kD junction could readily interact 
with the negatively charged amino acids at 
the NHj-terminus of actin. 

All the current kinetic models for the 
mechanism of muscle contraction require a 
change in the binding affinity of myosin for 
actin when ATP binds to the active site. 
Although it is difficult to predict how this 
effect can be communicated to the actin 
binding site, the structure suggests that this 
might be generated by changes in the relation 
between the upper and lower domains of the 
50-kD segment prompted by binding of the 7 
phosphate. Examination of Fig. 7 reveals that 
the potential binding site for the 7 phosphate 
would be located close to the confluence of 
the upper and lower domains of the 50-kD 
region below the current location of the sul- 
fate ion. These observations together with the 
information from docking myosin onto actin 
provide the information necessary to formu- 
late a basic structural model for muscle con- 
traction (46). 

The three-dimensional model of the my- 
osin SI presented in this article provides a 
molecular framework that can be used to 
address the issues of conformational changes 
during the contractile cycle and suggests 
how this molecule functions as a molecular 
motor. By a combination of molecular biol- 
ogy, in vitro motility assays, and chemical 
and kinetic studies, it should be possible to 
test these hypotheses concerning the molec- 
ular basis of motility. 

Finally, it is appropriate to consider why 
reductive methylation allows this molecule 
to crystallize. Examination of the structure 
reveals that it contains elements of flexibil- 
ity that might lead to multiple conforma- 
tions in solution, which in turn might 
prevent the formation of a crystalline lat- 
tice. It is conceivable that reductive meth- 
ylation serves to stabilize one of these con- 
formations in solution. Alternatively, re- 
ductive methylation may serve only to re- 
duce the solubility of the protein. 
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bridges that extend from the myosin fila- 
ment and interact cyclically in a rowing 
motion with the actin filament as adenosine 
triphosphate (ATP) is hydrolyzed (J, 2). 

The myosin head is an actin-activated 
adenosine triphosphatase (ATPase). Both 
solution kinetic studies and fiber experi- 
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Complex and Its Implications for 
Muscle Contraction 
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Muscle contraction consists of a cyclical interaction between myosin and actin driven by 
the concomitant hydrolysis of adenosine triphosphate (ATP) . A model for the rigor complex 
of F actin and the myosin head was obtained by combining the molecular structures of the 
individual proteins with the low-resolution electron density maps of the complex derived by 
cryo-electron microscopy and image analysis. The spatial relation between the ATP 
binding pocket on myosin and the major contact area on actin suggests a working hy- 
pothesis for the crossbridge cycle that Is consistent with previous independent structural 
and biochemical studies. 
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ABSTRACT Pairwise sequence comparison melhods have 
been assessed using proteins whose rtlationships are known 
reliably from their structures and functions, as described in 
the SCOP database f Murzin, A. Brenner, S. E., Hubbard T 
& Chothia C. (1995)7. Mol. BioL 247, 536-540]. The evalua- 
tion tested the programs BLAST (AJtscbuJ, S. F., Gish, 
Miller, W., Myers, E. W. & Lipman, D. J. (1990)./, MoL Bioi. 
215, 403-410], wu-BlA5n [Altschul, S. F. & Gish, W. (1996) 
Methods EnzymoL 266, 460-480], FaSTa fPearson, W. R. & 
Liproan, D. J. (1988) Proc. Nail, Acad, ScL USA 85, 2444-2448], 
and SSEAKCH [Smith, T. F. & Waterman, M. S. (1981) / MoL 
Biol. 147, 195-197) and their scoring schemes. The error rate 
of all algorithms is greatly reduced by using statistical scores 
to evaluate matches rather than percentage identity or raw 
scores. The £-value statistical scores of ssearch and fasta are 
reliable: the number of false positives found in our tesu agrees 
well with the scores reported. However, the P-values reported 
by BLACT and wij-BLAST2 exaggerate significance by orders of 
magnitude, ssearch, Fasta ktup = 1, and wu-BLA5n perform 
best, and they are capable of detecting almost all relationships 
between proteins whose sequence identities are >30%. For 
more distantly related proteins, they do much less well; only 
one-half of the relationships between proteins with 20-30% 
identity are found. Because many bomologs have low sequence 
similarity, most distant relationships cannot be detected by 
any pairwise comparison method; however, those which are 
ideotified may be used with confidence. 

Sequence database searching plays a role in vinually every 
branch of molecular biology and is crucial for interpreting the 
sequences issuing forth from genome projects. Given the 
method's cenirai role, it is surprising that overall and relative 
capabilities of different procedures are largely unknown. It is 
difficult to verify algorithms on sample data because this 
requires large data sets of proteins whose evolutionary rela- 
tionships are known unambiguously and indcpendenilv of the 
melhods being evaluated. However, nearly all known ho- 
mologs have been identified by sequence analysis (the method 
to be tested). Also, it is generally very difficult to know, in the 
absence of structural data, whether two proteins that lack clear 
sequence similarity are unrelated. This has meant thai al- 
though previous evaluations have helped improve sequence 
comparison, they have suffered from insufficient, imperfectly 
characterized, or artificial test data. Assessment also has been 
problematic because high quality database sequence searching 
attempts to have both sensitivity (detection of homologs) and 
specificity (rejection of unrelated proteins); however, these 
complementary goals are linked such that increasing one 
causes the other to be reduced. 

The pubhcaiion cosu of this aniclc were defrayed m pan bv page charge 
paymcni. This article must therefore be hereby marked "advtrmemenr in 
accordance with 18 U.S.C. 51734 solely lo indicate thjs faci. 
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Sequence comparison methodologies have evolved rapidly 
so no previously published tests has evaluated modem versions 
of programs commonly used. For example, parameters in 
blast (1) have changed, and wu-BUvsr: (2>-which produces 
gapped alignmenis^has become available. The latest version 
of FASTA (3) previously tested was 1.6, but the current release 
(version 3.0) provides fundamentally different results in the 
form of statistical scoring. 

The previous reports also have left gaps in our knowledge 
For example, there has been no published assessment of 
thresholds for scoring schemes more sophisticated than per- 
centage Identity. Thus, the widely discussed statistical scoring 
measures have never aaually been evaluated on large data- 
bases of real proteins. Moreover, the different scoring schemes 
commonly in use have not been compared. 

Beyond these issues, there is a more fundamental question- 
in an absolute sense, how well does pairwise sequence com- 
parison work? That is, what fraction of homologous proteins 
can be detected using modem database searching methods'' 
In this work, we attempt to answer these questions and to 
overcome both of the fundamenial difficulties that have bin- 
dered assessment of sequence comparison methodologies 
First, we use the set of distant evolutionary relationships in the 
scop: Structural Classification of Proteins database (4), which 
IS derived from structural and functional characteristics (5) 
The SCOP database provides a uniquely reliable set of ho- 
mologs. which are known independently of sequence compar- 
ison. Second, we use an assessment method that jointly mea- 
sures both sensitivity and specificity. This method allows 
straightforward comparison of different sequence searching 
procedures. Further, it can be used to aid interpretation of real 
database searches and thus provide optimal and reliable 
results. 

Previous Assessments of Sequence Comparison. Several 
previous studies have examined the relative performance of 
different sequence comparison methods. The most encom- 
passing analyses have been by Pearson (6. 7), who compared 
the three most commonly used programs. Of these, the Smith- 
waierman algorithm (8) implemented in ssearch (3) is the 
oldest and slowest but the most rigorous. Modem heuristics 
have provided blast (1) the speed and convenience to make 
It the most popular program. Intermediate between these two 
IS FASTA (3). which may be run in two modes offering either 
greater speed (ktup = 2) or greater effectiveness (ktup = I). 
Pearson also considered different parameters for each of these 
programs. 

To lest the methods. Pearson selected two representative 
proteins from each of 67 protein superfamilies defined by the 
PiR database (9). Each was used as a querv to search the 
database, and the matched proteins were marked as being 
homologous or unrelated according to their membership of pir 
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superfamilies. Pearson found thai modem matrices and "in- 
scaling'* of raw scores improve results considerably. He also 
reported thai the rigorous Smith- Waterman algorithm worked 
slightly better than Fasta, which was in turn more effective 
than BLAST. 

Very large scale analyses of matrices have been performed 
(10), and Henikoff and Henikoff (11) also evaluated the 
effectiveness of blast and fasta. Their test with blast 
considered the ability to detect homologs above a predeter- 
mined score but had no penalty for methods which also 
reponed large numbers of spurious matches. The Henikoffs 
searched the swiss-PROT database (12) and used PROSITE (13) 
10 dcilne homologous families. Their results showed thai the 
BLOSUM62 matrix (14) performed markedly bener than the 
extrapolated PAM-series matrices (15), which previously had 
been popular. 

A crucial aspect of any assessment is the data that are used 
to test the ability of the program to fmd homologs. But in 
Pearson's and the Henikoffs' evaluations of sequence com- 
parison, the correct results were effectively unknown. This is 
because the superfamilies in PIR and PROsrrE are principally 
created by using the same sequence comparison methods 
which are being evaluated, inierdependency of data and 
methods creates a "chicken and egg** problem, and means for 
example, that new methods would be penalized for correctly 
identifying homologs missed by older programs. For instance, 
immunoglobulin variable and constant domains are clearly 
homologous, but pir places them in different superfamilies. 
The problem is widespread: each superfamiiy in pir 48.00 with 
a structural homolog is itself homologous to an average of 1.6 
other PIR superfamilies (16). 

To surmount these sorts of difficulties. Sander and Schnei- 
der (17) used protein structures to evaluate sequence com- 
parison. Rather than comparing different sequence compari- 
son algorithms, their work focused on determining a length- 
dependent threshold of percentage identity, above which all 
proteins would be of similar structure. A result of this analysis 
was the HSSP equation; it states that proteins with 25% identity 
over 80 residues will have similar structures, whereas shorter 
alignments require higher identity. (Other studies also have 
used structures (18-20), but these focused on a small number 
of model proteins and were principally oriented toward eval- 
uating alignment accuracy rather than homology detection.) 

A general solution to the problem of scoring comes from 
statistical measures (i.e., E-values and P-values) based on the 
extreme value distribution (21). Extreme value scoring was 
implemented analytically in the blast program using the 
Karlin and Altschul statistics (22, 23) and empirical ap- 
proaches have been recently added to fasta and SSEarch. In 
addition to being heralded as a reliable means of recognizing 
significantly similar proteins (24, 25), the mathematical trac- 
tability of statistical scores "is a crucial feature of the blast 
algorithm" (1). The validity of this scoring procedure has been 
tested analytically and empirically (see ref. 2 and references in 
ref. 24). However, all large empirical tests used random 
sequences that may lack the subtle structure found within 
biological sequences (26. 27) and obviously do not contain any 
real homologs. Thus, although many researchers have sug- 
gested that statistical scores be used to rank matches (24, 25, 
28), there have been no large rigorous experiments on biolog- 
ical data to determine the degree to which such rankings are 
superior. 

A Database for Testing Homology DclectioD. Since the 
discovery that the structures of hemoglobin and myoglobin are 
very similar though their sequences are not (29)* it has been 
apparent that comparing structures is a more powerful (if less 
convenient) way to recognize distant evolutionary relation- 
ships than comparing sequences. If two proteins show a high 
degree of similarity in their structural details and function, it 



is very probable that they have an evolutionarv relationship 
though their sequence similarity may be low. 

The recent growth of protein structure information com- 
bined with the comprehensive evolutionary classification in 
the SCOP database (4, 5) have allowed us to overcome previous 
limitations. With these data, we can evaluate the performance 
of sequence comparison methods on real protein sequences 
whose relationships are known confidentK-. The scop database 
uses structural information to recognize distant homologs, the 
large majority of which can be determined unambiguously. 
These superfamilies, such as the globins or the immunoglobu- 
lins, would be recognized as related bv the vast majoritv of the 
biological community despite the lack of high sequence sim- 
ilarity. 

From SCOP, we exiraacd the sequences of domains of 
proteins in the Protein Data Bank (PDB) (30) and created two 
databases. One (PDB90D-B) has domains, which were all <90% 
identical to any other, whereas (PDB40D-B) had those <409'c 
identical. The databases were created by first soning all 
protein domains in scop by their quality and making a list. The 
highest quality domain was selected for inclusion in the 
database and removed from the list. Also removed from the list 
(and discarded) were all other domains above the threshold 
level of identity to the selected domain. This process was 
repeated until the list was empty. The pdB40D-b database 
contains 1.323 domains, which have 9,044 ordered pairs of 
distant relationships, or -0.5% of the total 1.749.006 ordered 
pairs. In PDB90D-B, the 2.079 domains have 53,988 relation- 
ships, representing 1.2% of all pairs. Low complexity regions 
of sequence can achieve spurious high scores, so these were 
masked in both databases by processing with the SEG program 
(27) using recommended parameters: 12 1.8 2.0. The databases 
used in this paper are available from htip://sss.sianford.edu/ 
sss/ , and databases derived from the current version of scoP 
may be found at hitp://scop-mrc-lmb.cam.ac.uk/scop/. 

Analyses from both databases were generally consistent, but 
PDB40D-B focuses on distantly related proieins'and reduces the 
heavy ovcrrepresentation in the pdb of a small number of 
families (31. 32). whereas pdbwd-b (with more sequences) 
improves evaluations of statistics. Except where noted other- 
wise, the distant homolog results here are from PDB40D-B. 
Although the precise numbers reponed here are specific to the 
structural domain databases used, we expect the trends to be 
general. 

Assessment Data and Procedure. Our assessment of se- 
quence comparison may be divided into four different major 
categories of tests. First, using just a single sequence compar- 
ison algorithm at a time, we evaluated the effectiveness of 
different scoring schemes. Second, we assessed the reliability 
of scoring procedures, including an evaluation of the validity 
of statistical scoring. Third, we compared sequence compari- 
son algorithms (using the optimal scoring scheme) to deter- 
mine their relative performance. Fourth, we examined the 
distribution of homologs and considered the power of pairwise 
sequence comparison to recognize them. All of the analyses 
used the databases of structurally identified homologs and a 
new assessment criterion. 

The analyses tested blast (1), version 1.4.9MP. and wu- 
blast: (2). version 2.0al3MP. Also assessed was the Fasta 
package, version 3.0i76 (3), which provided Fasta and the 
SSEARCH implementation of Smith-Waterman (8). For 
SSEARCH and FASTA. we used blosum45 with gap penalties 
-12/-1 (7. 16). The default parameters and matrix (BLO- 
SUM62) were used for blast and v^-blast2. 

The "Coverage Vs. Error** PloL To test a particular protocol 
(comprising a program and scoring scheme), each sequence 
from the database was used as a query to search the database. 
This yielded ordered pairs of query and target sequences with 
associated scores, which were soned, on the basis of their 
scores, from best to worst. The ideal method would have 
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perfeci separation, with all of the homologs at the top of the 
list and unrelated proteins below. In practice, perfect separa- 
tion is impossible to achieve so instead one is interested in 
drawing a threshold above which there are the largest number 
of related, pairs of sequences consistent wiih an acceptable 
error rate. 

Our procedure involved measuring the coverage and error 
for every threshold. Coverage was defined as the fraction of 
structurally determined homologs that have scores above the 
selected threshold; this reflects the sensitivity of a method- 
Errors per query (EPO), an indicator of selectivity, is the 
number of nonhomologous pairs above the threshold divided 
by the number of queries. Graphs of these data, called 
coverage vs. error plots, were devised to understand how 



protocols compare at different levels of accuracy These 
graphs share effectively all of the beneficial feature of Re- 
Clever Operating Characteristic (ROC) plots (33 34) but 
better represent the high degrees of accuracy required in 
sequence comparison and the huge background of nonho- 
moiogs. 

This assessment procedure is directlv relevant to practical 
sequence database searching, for it provides precisely the 
information necessary to perform a reliable sequence database 
search. The EPQ measure places a premium on score consis- 
tency; that IS. It requires scores to be comparable for different 
queries. Consistency is an aspect which has been largely 
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Fig. 2. Unrelated proteins with high pcrccniage identitv. Hemo- 
globin P-chain (PDB code Ihds chain b. ref. 38. Left) and ccflulasc E2 
(PDB code liml. ref. 39. Rtght) have 39% idenliiv over 64 residues a 
level which IS of icn believed lo be indicative of homology. Despite this 
high degree of identity, their structures stronglv suggest thai these 
proteins are not related. Appropriaielv. neither the raw alignment 
score of 85 nor the E-valuc of 1.3 is significant. Proteins rendered bv 
rasmol (40). 




100 
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200 



Fig. 3. Length and percentage ideniiiy of alignments of unrelated 
proteins in pdbood-b: Each pair of nonhomologous proicms found with 
SSEARCH IS plotted as a point whose position indicates the length and 
the percentage identity w.thin the alignment. Because alignment 
length and percentage ideniiiy are quantized, many pairs of proteins 
may have exactly the same alignment length and percentage idemiry 
The line shows the HSSP threshold (though it is intended to be applied 
with a different matrix and parameters). 
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Fio. 4. Reliability of staiisiical scores in PDBWD-b: Each line shows 
the relationship between reported statistical score and actual error 
rate for a different program. E-vaiues arc reported for sseaACH and 
FASTA. whereas P-valucs are shown for Bi^^ and wu-BLaST2. If the 
scoring were perfect, then the number of errors per query and the 
E-values would be the same, as indicated by the upper bold line. 
(P-values should be the same as EPO for small' numbers, and diverges 
al higher values, as indicated by the lower bold line.) E-values from 
ssEARCH and fasta are shown to have good agreement with EPO but 
underestimate the significance slightly, blast and wu-blasT2 are 
overconfident, with the degree of exaggeration dependent upon the 
score. The results for pdbwd-b were similar lo those for pdbwe>.b 
despite the difference in number of homologs detected. This graph 
could be used lo roughly calibrate the reliability of a given statistical 
score. 

ignored in previous tests but is essential for the straightforward 
or automatic inierprciation of sequence comparison results. 
Further, it provides a clear indication of ihe confidence that 
should be ascribed to each match. Indeed, the EPQ measure 
should approximate the expectation value reponed by data- 
base searching programs, if the programs' estimates are accu- 
rate. 

The Performance of Scoring Schemes. All of the programs 
tested could provide three fundamental types of scores. The 
first score is the percentage identity, which may be computed 
in several ways based on either the length of the alignment or 
the lengths of the sequences. The second is a *'raw" or 
"Smith-Waterman" score, which is the measure optimized by 
the Smith-Waterman algorithm and is computed by summing 
the substitution matrix scores for each position in the align- 
ment and subtracting gap penalties. In BL^VST, a measure 

S«quence Compartaon Atgortthma (PDB40D-B) 
1 1 — I , — - ^ 



related to this score is scaled into bits. Third is a staiisiical 
score based on the txireme value distribution. These results 
arc summarized in Fig. 1. 

Sequence IdeDtit>-. Though it has been lone established that 
percentage ideniiiy is a poor measure (35), there is a common 
rule-of-thumb slating that 309r identirv- sienifics homology. 
Moreover, publications have indicated ihat^259'r identirv can 
be used as a threshold (17, 36). We find that these thresholds, 
originally derived year^ ago, are not supponed bv present 
results. As databases have grown, so have the possibilities for 
chance alignments wiih high identity: thus, the reported cutoffs 
lead to frequent errors. Fig. 2 shows one of the manv pairs of 
proteins with very different structures that nonetheless have 
high levels of identity over considerable aliened regions. 
Despiie the high identity, the raw and the statistical scores for 
such incorrect matches are typically not significant. The prin- 
cipal reasons percentage identity does so^poorlv seem to be 
that it ignores information about gaps and about the conser- 
vative or radical nature of residue substitutions. 

From the PDBWD-b analysis in Fig. 3, we learn that 30% 
identity is a reliable threshold for this database only for 
sequence alignments of at least 150 residues. Because one 
uiirelated pair of proteins has 43.5% ideniiiv over 62 residues, 
il IS probably necessary for alignments to be at least 70 residues 
in length before 40% is a reasonable threshold, for a database 
of this particular size and composition. 

At a given reliability, scores based on percentage identity 
detect just a fraction of the distant homologs found by 
statistical scoring. If one measures the percentage identity in 
the aligned regions without consideration of alignment length, 
then a negligible number of distant homoloes are detected. 
Use of the HSSP equation improves the value of percentage 
identity, but even this measure can find onlv 4% of all known 
homologs at 1% EPQ. In short, percentage identity discards 
most of the information measured in a sequence comparison. 

Raw Scores. Smith-Waterman raw scores perform better 
than percentage ideniiiy (Fig. 1 ). but In-scaling (7) provided no 
notable benefit in our analysis. It is necessary to be verv precise 
when using either raw or bit scores because a 20% change in 
cutoff score could yield a tenfold difference in EPQ. However, 
II IS difficult to choose appropriate thresholds because the 
reliability of a bit score depends on the lengths of the proteins 
matched and the size of the database. Raw score thresholds 
also are affecied by matrix and gap parameters. 

Statistical Scores. Statistical scores were introduced partly 
to overcome the problems that arise from raw scores. This 
scoring scheme provides the best discrimination between 
homologous proteins and those which are unrelated. Most 
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Fig. 5. Coverage vs^error plou of different sequence comparison methods: Five different sequence comparison methods arc evaluated each 
''^7% ipo7^^^^^^ and wu BiL^s^'''"^' database^n th.s analyse, the best method is th\ slow sse.r^h. wh.ch L^s 1 8% 0?/"^ t onsh ^ 
a % EPO o^hls d^^^^^ (5) PDBWI>B database. The quick wu-BlASn program provides the bes. coverage 

at 1% bfU on this database, although at higher levels of error ii becomes slightly worse than Fast a ktup = 1 and ssearch. 
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likely, its power can be anributed to its incorporation of more 
information than any other measure; it takes account of the 
full substitution and gap data (like raw scores) but also has 
details about the sequence lengths and composition and is 
scaled appropriaieiy. 

We fmd thai statistical scores are not only powerful, bur also 
easy to interpret, ssearch and Fasta show close agreement 
between statistical scores and actual number of errors per 
query (Fig. 4). The expectation value score gives a good, 
slightly conservative estimate of the chances of the two se- 
quences being found at random in a given query. Thus, an 
E-value of 0.01 indicates that roughly one pair of nonhomoiogs 
of this similarity should be found in every- 100 differ eri: queries. 
Neither raw scores nor percentage identity can be interpreted 
in this way» and these results validate the suitability of the 
extreme value distribution for describing the scores from a 
database search. 

The P-values from blast also should be directly interpret- 
able but were found to overstate significance by more than two 
orders of magnitude for \ % EPQ for this database. Nonethe- 
less, these results strongly suggest that the analytic theory is 
fundamentally appropriate, wu-blast; scores were more're- 
liable than those from blast, but also exaggerate expected 
confidence by more than an order of magnitude at \% EPQ. 

Overall DetcctioD of Homologs and Comparison of Algo- 
rithms. The results in Fig. 5A and Table 1 show that pairwise 
sequence comparison is capable of identifying only a small 
fraction of the homologous pairs of sequences in pdB40D-b. 
Even SSEARCH with E-values. the best protocol tested, could 
fmd only 18% of alt relationships at a 19c EPQ. BLAST, which 
identifies 159c, was the worst performer, whereas fasta 
ktup = 1 is nearly as effective as ssearch. fasta ktup = 2 and 
WU-BLAST2 are intermediate in their ability to detect ho- 
mologs. Comparison of different algorithms indicates that 
those capable of identifying more homologs are generally 
slower. SSEARCH is 25 times slower than BLAST and 6.5 times 
slower than Fasta ktup = 1. wu-blast2 is slightly faster than 
FASTA ktup = 2, but the latter has more interpreiablc scores. 

in PDB90D-B, where there are many close relationships, the 
best method can identify only 38% of structurally known 
homologs (Fig. SB). The method which fmds that many 
relationships is wu-biast2. Consequently, we infer that the 
differences between Fasta kup = 1. ssearch, and wu-blast: 
programs are unlikely to be significant when compared with 
variation in database composition and scoring reliability. 

Fig. 6 helps to explain why most distant homologs cannot be 
found by sequence comparison: a great many such relation- 
ships have no more sequence identity than would be expected 
by chance, ssearch with E-values can recognize >90% of the 
homologous pairs with 30-40% identity. In this region, there 
are 30 pairs of homologous proteins thai do not have signif- 
icant E-values, but 26 of these involve sequences with <50 
residues. Of sequences having 25-30% identity, 75% are 
identified by ssearch E-values. However, although the num- 
ber of homologs grows at lower levels of identity, the detection 
falls off sharply: only 40% of homologs with 20-25% identity 
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Fig. 6. Distnbuiion and dcteciion of homorogs in rDB40iVB Bars 
show the distnbuiion of homologous pain PDB40t>B according lo ihcir 
Identity (using the measure of idenliiy in both). Filled regions indicate 
the number of these pairs found by the best database searching method 
(SSEARCH with E-valucs) at 1% EPQ. The pdwoivb database contains 
proteins with <40% identity, and as shown on this graph, most 
structurally identified homologs in the database have diverged cx> 
ircmcly far in sequence and have <209e ideniirv. Note that the 
alignments may be inaccurate, especially at low levels of identity Filled 
regions show that ssearch can idenUfy most relationships that have 
Z5% or more identity, but its detection wanes sharplv below 25%. 
Consequently, the great sequence divergence of most structurally 
Identified evolutionary relationships effectively defeats the ability of 
panwisc sequence comparison to detect them. 

are detected and only 10% of those with 15-20% can be found. 
These results show that statistical scores can find related 
proiems whose identity is remarkably low; however, the power 
of the method is restricted by the great divergence of many 
protein sequences. 

After completion of this work, a new version of pairwise 
BLAST was released: BLASTGP (37). It suppons gapped align- 
ments, like WU-BLAST2, and dispenses with sum statistics. Our 
initial tests on BL^STGP using default parameters show that its 
E-values are reliable and thai its overall detection of homologs 
was substantially better than that of ungapped blast, but not 
quite equal lo that of wu-blast:. 

CONCLUSION 

The general consensus amongst experis (see refs. 7. 24. 25, 27 
and references therein) suggests thai the most effective se- 
quence searches are made by (/ ) using a large current database 
in which the protein sequences have been complexity masked 
and {ii) using statistical scores to interpret the results. Our 
experiments fully support this view. 

Our results also suggest two further points. First, the E-val- 
ues reported by FaSTa and SSEaRCH give fairlv accurate 
esiimaies of the significance of each match, but the P-values 
provided by blast and wu-biast: underestimate the true 



Table 1. Summary of sequence comparison methods with pdb40D-b 


Method 


Relative Time' 


ITf EPO Cutoff 


Coverage at 19c EPO 


SSEARCH % identity: within alignment 
SSEARCH % identity: within both 
SSEARCH % identity: HSSP-scalcd 
sseaRch Smith- Waterman raw scores 


25.5 
25.5 
25.5 
25.5 


349c 

35% (HSSP * 9.8) 
142 


<0.1 
3.0 
4.0 
10.5 
18.4 
17.9 
16.7 
17.5 
14.8 


SSEARCH E-values 
FASTA ktup = 1 E-values 
Fasta ktup = 2 E-values 
wli-blast: P-values 
BLAST p-values 


25.5 
3.9 
1.4 
1.1 
1.0 


0.03 

0.03 

0.03 

0.003 

0.00016 


•Times arc from large database searches with 


genome proteins. 
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extent of errors. Second, ssearch, wu-BLastz and Fasta 
ktup = 1 perform best, though BLAST and facta kiup = 2 
detect most of the relationships found by the best procedures 
and are appropriate for rapid initiaJ searches. 

The honioiogous proteins that are found by sequence com- 
parison can be distinguished with high reliability from the huge 
number of unrelated pairs. However, even the best database 
searching procedures tested fail to fmd the large majority of 
distant evolutionary relationships at an acceptable error rate. 
Thus, if the procedures assessed here fail to find a reliable 
match, it docs not imply that the sequence is unique; rather, it 
indicates that any relatives it might have are distant ones.'* 



••Additional and updated information about this work, including 
supplementary Tigures. may be found at hltp://sss,Jlanford.edu/sss/. 
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hmmpfam - search a single seq against HMM database 
HMMER 2.1.1 (Dec 1998) 

Copyright (C) 1992-1998 Washington University School of Medicine 
HMMER is freely distributed under the GNU General Public License (GPL) 



HMM file: 
Sequence file: 



/data/isb2k/blastdb/Pfain72/Pfain72 
/u/legal/ jennyb/pf 621 . seq 



Query: 1929760CD1 

Scores for sequence family classification (score includes all domains) : 
Model Description Score E-value N 



myosin_head Myosin head (motor domain) 



-188.6 



1.2e-19 



Parsed for domains : 

Model Domain seq-f seq-t 



my o s i n__he ad 1/1 



388 



hmm-f hmm-t score E-value 

1 734 [] -188.6 1.2e-19 



Alignments of top-scoring domains: 

myosin_head: domain 1 of 1, from 5 to 388: score -188.6, E = 1.2e-19 

*->vEDmveLtyLnEpsvlhNLKkRYksdlIYTYsGlvLvsvNPYkrLpq 

P 

1929760CD1 5 P- 5 



1929760CD1 



1929760CD1 



1929760CD1 



iYteeiiakYrGKrryElPPHiFAiADeAYRsMlsdkeNQsillSGESGA 



GKTEntKkvmqYlAaVsggnsgngeevpsvkvgrvEdqlLqsNPiLEAFG 
Vs++ s 

6 _^____QVSCSLS 12 

NAKTtRNNNSSRFGKyielqFdktGkivGaklenYLLEKSRVvyQtegER 



NFHIFYQLLaGasqqnlkkeLkLtndpedYhYLnqggevkpcytvdGiDD 
+++ L +++ qg+ 
1929760CD1 13 LMPR LP-SIRHW QGP 26 

segnveeFketrkAmdilGf tdeeqrsIFrivAalLhlGNikFkqrrkee 
s 

1929760CD1 27 S 27 



1929760CD1 



aaipddnnadtkal ekaaeLlGvda telekALl srr i ktG tegrkS tvtk 
+++ ++ G + 
28 HPG FL GPLF 36 



pqnveQAsyARDALAKalYsRLFdWIVnrlNktLdfkakegqdasf IGVL 
p +L+ +++ a f G L 

1929760CD1 37 PI CSLQWPHGFS--AIFPGLL 55 

DlyGFEIFekNSFEQLCINYvNEKLQQfFNhhmFklEQEEYkrEGIeWtf 
D yGFE F +NS EQLCINY+NEKLQQ+F h + + QEEY EG eW+f 
1929760CD1 56 DVYGFESFPDNSLEQLCINYANEKLQQHFVAHYLRAQQEEYAVEGLEWSF 105 

IdFgdNLQpcIDLIEkKsPpGILsLLDEeClfPkaqSGtDqtFldKLyst 
1++ dN Qpc DLIE+ P+ I sL+ EeC++ + + + + + 
1929760CD1 106 INYQDN-QPCLDLIEGS-PISICSLINEECRLNRPS--SARQLQTRIETA 151 



1929760CD1 



152 



fskhpahfekf sPrfrqkksgahFiikHYAGdVeYnvegFleKNKDpLfd 
+ p + + + s Fi++HYAG V+Y + g +eKNKDp+++ 
LAGSPCLGHN KLSREPS FIWHYAGPVRYHTAGLVEKNKDPIPP 



195 
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17. I declare further that all statements made herein of my own knowledge are true 
and that all statements made herein on information and belief are believed to be true; and further, that 
these statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, and that willful false statements may jeopardize the validity 
of this application and any patent issuing thereon. 



Tod Bedilion 



Signed at Redwood City, California 
this day of July, 2003 
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dlisllksSsnpllaeLFpdeetlagpf eadpsslskkrksgskNkstgk 
+1 ll++S++pll++LFp P++++++ + 
1929760CD1 196 ELTRLLQQSQDPLLMGLFP TNPKEKTQEE P 225 

ktkksnf i . TvGaqf KeslneLMktLsstnLPHFvRCIkPNekKkagvf D 
++++ + Tv ++fK si L+ L St+ PH++RCIkPN+ +a +f 
1929760CD1 226 PGQSRAPVlTWSKFKASLEQLLQVLHSTT-PHYIRCIKPNSQGQAQTFL 274 

aslVlhQLrclGVLEgiRIrRaGFPnRitfdeFlqRYriLapktwP. . . . 
++ Vl+QL ++G+ E+i 1+ aGFP R+ + F++RY++L + ++++ 
19297 60CD1 275 QEEVLSQLEACGLVETIHISAAGFPIRVSHRWFVERYKLLRRLHPCtSsg 324 

kwsgdakkgeknEIvaceklLqsLn 

+++++++ ++W +++ +e l+q+ ++ + ++ ++ 

19297 60CD1 325 pdspypakglpEWCPHSEEA TLEPLIQDILhtlpvltqaaaitg 368 

IDkgeeyrf GkTKIFFR<-* 

++ + + + + G TK+F 
19297 60CD1 369 dsaeaMPA-PMH-CGRTKVFMT 388 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICr^T^W ^ ^ 




In re Application of: Tang et al. 
Title: MYOSIN HEAVY CHAIN HOMOLOG 

Serial No.: 09/830,914 Filing Date: May 01, 2001 

Examiner: Fronda, C. Group Art Unit: 1652 

Mail Stop: Non-Fee Amendment 
Commissioner for Patents 
P. O. Box 1450 
Alexandria, VA 22313-1450 

DECLARATION OF DR. TOD BEDILION 
UNDER 37 C.F.R. §1.132 

I, TOD BEDILION, a citizen of the United States, residing at 132 Winding Way, San 
Carlos, California, declare that: 

1. I was enployed by Incyte Genonnics, Inc. (hereiaafter "Incyte") as a Director 
of Corporate Development until May 1 1 , 2001 . I am currently under contract to be a Consultant to 
Incyte Genomics, Inc. 

2. In 1996, 1 received a Ph.D. degree in CeU, Molecular and Development 
Biology from UCLA, I had previously received, in 1988, a B.S. degree in biology from UCLA. 

Upon my graduation from UCLA, I became, ia April 1996, the first employee of 
Synteni, Inc. (hereiaafter "Synteni"). I was a Research Director at Synteni from April 1996 until 
Synteni was acquired by Incyte in early 1998. 

I understand that Synteni was founded ia 1994 by T. Dari Shalon while he was a 
graduate student at Stanford University. I further understand that Synteni was founded for the purpose 
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of commercially exploiting certain "cDNA microarray" technology that was being worked on at 
Stanford in the early to mid-1990s. That technology, which I will sometimes refer to herein as the 
"Stanford-developed cDNA microarray technology", was the subject of Dr. Shalon's doctoral thesis at 
Stanford. I understand and believe that Dr. P.O. Brown was Dr. Shalon's thesis advisor at Stanford. 

During the period beginning before I was employed by Synteni and ending upon its 
acquisition by Incyte in early 1998, 1 understand Synteni was the exclusive licensee of the Stanford- 
developed cDNA microarray technology, subject to any right that the United States government may 
have with respect to that technology. In early 1998, 1 understand Incyte acquired rights under the 
Stanford-developed cDNA microarray technology as part of its acquisition of Synteni. 

I understand that at the time of the commencement of my employment at Synteni in 
April 1996, Synteni's rights with respect to the Stanford-developed cDNA technology included rights 
under a United States patent application that had been filed June 7, 1995 in the names of Drs. Brown 
and Shalon and that subsequently issued as United States Patent No. 5,807,522 (the Brown '522 
patent). In December 1995, the subject matter of the Brown '522 patent was published based on a 
PCT patent application that had also been filed in June 1995. The Brown '522 patent (and its 
corresponding PCT application) describes the use of the Stanford-developed cDNA technology in a 
number of gene expression monitoring applications, as will be discussed more fully below. 

Upon Incyte's acquisition of Synteni, I became employed by Incyte. From early 1998 
until late 1999, 1 was an Associate Research Director at Incyte. In late 1999, 1 was promoted to the 
position of Director, Corporate Development. 

I have been aware of the Stanford-developed cDNA microarray technology since 
shortly before I commenced my employment at Synteni. While I was employed by Synteni, virtually all 
(if not all) of my work efforts (as well as the work efforts of others employed by Synteni) were directed 
to the further development and commercial exploitation of that cDNA microarray technology. By the 
end of 1997, those efforts had progressed to the point that I understand Incyte agreed to pay at least 
about $80 million to acquire Synteni. Since I have been employed by Incyte, I have continued to work 
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on the further development and commercial exploitation of the cDNA microarray technology that was 
first developed at Stanford in the early to mid-1990s. 

3. I have reviewed the specification of a United States patent application that I 
understand was filed on May 2, 2001 in the names of Tang et al. and was assigned Serial No. 
09/830,914 (hereinafter "the Tang '914 application"). Furthermore, I understand that this United 
States patent application claimed priority to United States Provisional Patent Application Serial No. 
60/172,248 filed on November 5, 1998 (hereinafter "the Tang '248 application"). The SEQ ID NO:l- 
encoding polynucleotides were described in the Tang '248 application. My remarks herein will 
therefore be directed to the Tang '248 patent application, and November 5, 1998, as the relevant date 
of filing. In broad overview, the Tang '248 specification pertains to certain nucleotide and amino acid 
sequences and their use in a number of applications, including gene expression monitoring applications 
that are useful in connection with (a) developing drugs (e.g., the diagnosis of inherited and acquired 
genetic disorders, expression profiling, toxicology testing, and drug development with respect to cancer, 
an immunopathology, a neuropathology, and the like), and (b) monitoring the activity of drugs for 
purposes relating to evaluating their efficacy and toxicity. 

4. I understand that (a) the Tang '248 application contains claims that are directed 
to isolated and purified polynucleotides having the sequences disclosed in the Tang '914 application as 
SEQ ID NO:l-encoding polynucleotides, for example SEQ ID NO:2 (hereinafter "the SEQ ID NO:l- 
encoding polynucleotides"), and (b) the Patent Examiner has rejected those claims on the grounds that 
the specification of the Tang '248 application does not disclose a substantial, specific and credible utility 
for the claimed SEQ ID NO: 1 -encoding polynucleotides. I further understand that whether or not a 
patent specification discloses a substantial, specific and credible utility for its claimed subject matter is 
properly determined from the perspective of a person skilled in the art to which the specification 
pertains at the time of the patent application was filed. In addition, I understand that a substantial, 
specific and credible utility under the patent laws must be a "real-world" utility. 
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5. I have been asked (a) to consider with a view to reaching a conclusion (or 

conclusions) as to whether or not I agree with the Patent Examiner's position that the Tang '248 

application does not disclose a substantial, specific and credible "real-world" utility for the claimed 

SEQ JD NO:l-encoding polynucleotides, and (b) to state and explain the bases for any conclusions I 

reach. I have been informed that, in connection with my considerations, I should determine whether or 

not a person skilled in the art to which the Tang '248 application pertains on November 5, 1998 would 

have concluded that the Tang '248 application disclosed, for the benefit of the public, a specific 

beneficial use of the SEQ ID NO: 1 -encoding polynucleotides in their then available and disclosed form. 

I have also been informed that, with respect to the "real-world" utility requirement, the Patent and 

Trademark Office instructs its Patent Examiners in Section 2107 of the Manual of Patent Examining 

Procedure, under the heading "1. 'Real-World Value' Requirement": 

"Many research tools such as gas chromatographs, screening assays, and 
nucleotide sequencing techniques have a clear, specific and unquestionable utility (e.g., 
they are useful in analyzing compounds). An assessment that focuses on whether an 
invention is useful only in a research setting thus does not address whether the specific 
invention is in fact 'useful' in a patent sense. Instead, Office personnel must distinguish 
between inventions that have a specifically identified utility and inventions whose 
specific utility requires further research to identify or reasonably confirm." 

6. I have considered the matters set forth in paragraph 5 of this Declaration and . 
have concluded that, contrary to the position I understand the Patent Examiner has taken, the 
specification of the Tang '248 patent application disclosed to a person skilled in the art at the time of its 
filing a number of substantial, specific and credible real-world utilities for the claimed SEQ ID N0:1- 
encoding polynucleotides. More specifically, persons skilled in the art on November 5, 1998 would 
have understood the Tang '248 application to disclose the use of the SEQ ID NO:l-encoding 
polynucleotides in a number of gene expression monitoring applications that were well-known at that 
time to be useful in connection with the development of drugs and the monitoring of the activity of such 
drugs. I explain the bases for reaching my conclusion in this regard in paragraphs 7-16 below. 
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7. In reaching the conclusion stated in paragraph 6 of this Declaration, I 
considered (a) the specification of the Tang '248 application, and (b) a number of published articles 
and patent documents that evidence gene expression monitoring techniques that were well-known 
before the November 5, 1998 filing date of the Tang '248 application. The published articles and 
patent documents I considered are: 

(a) Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P.O., and Davis, 
R.W., Parallel human genome analvsis: Microarrav-based expression monitoring of 1000 genes , Proc. 
Nad. Acad. Sci. USA, 93, 10614-10619 (1996) (hereinafter "the Schena 1996 article") (copy 
annexed at Tab A); 

(b) Schena, M., Shalon, D., Davis, R.W., Brown, P.O., Ouantitative 
Monitoring of Gene Expression Patterns with a Complementary DNA Microarrav . Science, 270, 467- 
470 (1995) (hereinafter "the Schena 1995 article") (copy annexed at Tab B); 

(c) Shalon and Brown PCT patent application WO 95/35505 titled 
"Method and Apparatus For Fabricating Microarrays Of Biological Samples," filed on June 16, 1995, 
and published on December 28, 1995 (hereinafter "the Shalon PCT application") (copy annexed at 
Tab C); 

(d) Brown and Shalon U.S. Patent No. 5,807,522, corresponding to the 
Shalon PCT application, titled "Methods For Fabricating Microarrays Of Biological Samples," filed on 
June 7, 1995 and issued on September 15, 1998 (hereinafter "the Brown *522 patent") (copy annexed 
at Tab D); 

(e) DeRisi, J., Penland, L., and Brown, P.O. (Group 1); Bittner, M.L., 
Meltzer, P.S., Ray, M., Chen, Y., Su, Y.A., and Trent, J.M. (Group 2), Use of a cDNA microarrav to 
analyse gene expression patterns in human cancen Nat. Genet., 14(4), 457-460 (1996) (hereinafter 
"the DeRisi article") (copy annexed at Tab E); 

(f) Shalon, D., Smith, S.J., and Brown, P.O., A DNA Microarrav System 
for Analyzing Complex DNA Samples Using Two-color Fluorescent Probe Hybridization , Genome 
Res., 6(7), 639-645 (1996) (hereinafter "the Shalon article") (copy annexed at Tab F); 

(g) Heller, R.A., Schena, M., Chai A., Shalon, D., Bedilion, T., Gilmore, 

J., Woolley, D.E., and Davis R.W., Discovery and analysis of inflammatory disease-related genes using 
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cDNA microairavs , Proc. Natl. Acad. Sci. USA, 94, 2150-2155 (1997) (hereinafter "the Heller 
article")(copy annexed at Tab G); 

(h) Sambrook, J., Fritsch, E.F., Maniatis, T., Molecular Cloning, A 
Laboratory Manual pages 7.37 and 7.38, Cold Spring Harbor Press (1989) (hereinafter "the 
Sambrook Manual") (copy annexed at Tab H); 

8. Many of the published articles and patent documents I considered (i.e., at least 
items (a)-(g) identified in paragraph 7) relate to work done at Stanford University in the early and mid- 
1990s with respect to the development of cDNA microarrays for use in gene expression monitoring 
applications under which Synteni became exclusively licensed. As I will discuss, a person skilled in the 
art who read the Tang '248 application on November 5, 1998 would have understood that application 

to disclose the SEQ ID NO: 1-encoding polynucleotides to be useful for a number of gene expression 
monitoring applications, e.g., as a probe for the expression of that specific polynucleotide in cDNA 
microarrays of the type first developed at Stanford. 

9. Turning more specifically to the Tang '248 specification, the SEQ ID N0:2 
polynucleotide is shown at pp. 3-4 as one of 4 sequences under the heading "Sequence Listing." The 
Tang '248 specification specifically teaches that the invention "provides an isolated and purified 
polynucleotide encoding the polypeptide comprising the amino acid sequence of SEQ ID NO: 1" (Tang 
'248 application at p. 2). It further teaches that (a) the identity of the SEQ ID N0:2 polynucleotide 
was determined from a colon tumor tissue cDNA library (COLNTUT03) (Tang '248 application at pp. 
11 and 33), (b) the SEQ ID N0:2 polynucleotide encodes for the human myosin heavy chain homolog 
(MHCH) shown as SEQ ID N0:1 (Tang '248 application at p. 11), and (c) northern analysis of SEQ 
ID NO:2 shows its expression predominantly in cDNA libraries associated with hematopoietic/immune 
system, gastrointestinal, musculoskeletal, and reproductive tissues, and in tissues associated with cancer 
(Specification at page 12, lines 4-9). 

The Tang '248 application discusses a number of uses of the SEQ ID NO: 1-encoding 
polynucleotides in addition to their use in gene expression monitoring applications. I have not fully 
evaluated these additional uses in connection with the preparation of this Declaration and do not 
express any views in this Declaration regarding whether or not the Tang '248 specification discloses 
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these additional uses to be substantial, specific and credible real-world utilities of the SEQ ID NO:l- 
encoding polynucleotides. Consequently, my discussion in this Declaration concerning the Tang '248 
application focuses on the portions of the application that relate to the use of the SEQ ID N0:1- 
encoding polynucleotides in gene expression monitoring applications. 



10. The Tang '248 application discloses that the polynucleotide sequences 
disclosed therein, including the SEQ ID NO:l-encoding polynucleotides, are useful as probes in 
microarrays. It further teaches that the microarrays can be used "to monitor the expression level of large 
numbers of genes simultaneously" for a number of purposes, including "to develop and monitor the 
activities of therapeutic agents" (Tang '248 application at p. 31, lines 35-36). 

In the paragraph inmiediately following the Tang '248 teachings described in the 
preceding paragraph of this Declaration, the Tang '248 application teaches that microarrays can be 
prepared using the previously mentioned cDNA microarray technology developed at Stanford in the 
early to mid-1990s. In this connection, the Tang '248 application specifically cites to the Schena 1996 
article identified in item (a) of paragraph 7 of this Declaration (Tang '248 application at p. 32; supra, 
paragraph 7). 

The Schena 1996 article is one of a number of documents that were published prior to 
the November 5, 1998 filing date of the Tang '248 application that describes the use of the Stanford- 
developed cDNA technology in a wide range of gene expression monitoring applications, including 
monitoring and analyzing gene expression patterns in human cancer. In view of the Tang '248 
application, the Schena 1996 article, and other related pre-November 1998 publications, persons 
skilled in the art on November 5, 1998 clearly would have understood the Tang '248 application to 
disclose the SEQ ID NO:l-encoding polynucleotides to be useful in cDNA microarrays for the 
development of new drugs and monitoring the activities of drugs for such purposes as evaluating their 
efficacy and toxicity, as explained more fully in paragraph 15 below. 

With specific reference to toxicity evaluations, those of skill in the art who were 
working on drug development in November 1998 (and for many years prior to November 1998) 
without any doubt appreciated that the toxicity (or lack of toxicity) of any proposed drug they were 
working on was one of the most important criteria to be considered and evaluated in connection with 
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the development of the drug. They would have understood at that time that good drugs are not only 
potent, they are specific. This means that they have strong effects on a specific biological target and 
minimal effects on all other biological targets. Ascertaining that a candidate drug affects its intended 
target, and identification of undesirable secondary effects (i.e., toxic side effects), had been for many 
years among the main challenges in developing new drugs. The ability to determine which genes are 
positively affected by a given drug, coupled with the ability to quickly and at the earliest time possible in 
the drug development process identify drugs that are likely to be toxic because of their undesirable 
secondary effects, have enormous value in improving the efficiency of the drug discovery process, and 
are an im.portant and essential part of the development of any new drug. Accordingly, the teachings in 
the Tang '248 application, in particular regarding use of the SEQ E) NO:l-encoding polynucleotides in 
differential gene expression analysis and in the development and the monitoring of the activities of drugs, 
clearly includes toxicity studies and persons skilled in the art who read the Tang '248 application on 
November 5, 1998 would have understood that to be so. 

11. The Schena 1996 article was not the first publication that described the use of 
the cDNA microarray technique developed at Stanford to monitor quantitatively gene expression 
patterns. More than a year earlier (i.e., in October 1995), the Schena 1995 article, titled "Quantitative 
Monitoring of Gene Expression Pattems with a Complementary DNA Microarray", was published (see 
Tabs A and B). 

12. As previously discussed {supra, paragraphs 2 and 7), in the mid-1990s patent 
applications were filed in the names of Drs. Shalon and Brown that described the Stanford-developed 
cDNA microarray technology. The two patent documents (i.e., the Shalon PCT application and the 
Brown *522 patent) annexed to this Declaration at Tabs C and D evidence information that was 
available to the public regarding the Stanford-developed cDNA microarray technology before the 
November 5, 1998 filing date of the Tang '248 application. 

The Shalon PCT patent application, which was published in December 1995, contains 
virtually the same (if not exactly the same) specification as the Brown '522 patent. Hence, the Brown 
'522 patent disclosure was, in effect, available to the public as of the December 1995 publication date 
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of the Shalon PCT application(see Tabs C and D). For the sake of convenience, I cite to and discuss 

the Brown '522 specification below on the understanding that the descriptions in that specification were 

published as of the December 28, 1995 publication date of the Shalon PCT application. 

The Brown '522 patent discusses, in detail, the utility of the Stanford-developed cDNA 

microarrays in gene expression monitoring applications. For example, in the "Summary Of The 

Invention" section, the Brown *522 patent teaches (see Tab D, col. 4, line 52-coL 5, line 8): 

Also forming part of the invention is a method of detecting 
differential expression of each of a plurality of genes in a first cell type, 
with respect to expression of the same genes in a second cell type. In 
practicing the method, there is first produced fluorescent-labeled 
cDNAs from mRNAs isolated from two cells types, where the cDNAs 
from the first and second cell types are labeled with first and second 
different flourescent reporters. 

A mixture of the labeled cDNAs from the two cell types is 
added to an array of polynucleotides representing a plurality of known 
genes derived from the two cell types, under conditions that result in 
hybridization of the cDNAs to complementary-sequence 
polynucleotides in the array. The array is then examined by 
fluorescence under fluorescence excitation conditions in which (i) 
polynucleotides in the array that are hybridized predominantly to 
cDNAs derived from one of the first or second cell types give a distinct 
first and second fluorescence emission color, respectively, and (ii) 
polynucleotides in the array that are hybridized to substantially equal 
numbers of cDNAs derived from the first and second cell types give a 
distinct combined fluorescence emission color, respectively. The 
relative expression of known genes in the two cell types can then be 
determined by the observed fluorescence emission color of each spot. 

The Brown *522 patent further teaches that the "[m]icroarrays of immobilized nucleic 
acid sequences prepared in accordance with the invention" can be used in "numerous" genetic 
applications, including "monitoring of gene expression" applications (see Tab D at col. 14, lines 36-42). 
The Brown '522 patent teaches (a) monitoring gene expression (i) in different tissue types, (ii) in 
different disease states, and (iii) in response to different drugs, and (b) that arrays disclosed therein may 
be used in toxicology studies (see Tab D at col. 15, lines 13-18 and 52-58 and col. 18, lines 25-30). 
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13. Also pertinent to my considerations underlying this Declaration is the DeRisi 
article, published in December 1996. The DeRisi article describes the use of the Stanford-developed 
cDNA microarray technology "to analyze gene expression patterns in human cancer" (see Tab E at, 
e.g., p. 457). The DeRisi article specifically indicates, consistent with what was apparent to persons 
skilled in the art in December 1996, that increasing the number of genes on the cDNA microarray 
permits a "more comprehensive survey of gene expression patterns," thereby enhancing the ability of 
the cDNA microarray to provide "new and useful insights into human biology and a deeper 
understanding of the gene pathways involved in the pathogenesis of cancer and other diseases" (see 
Tab E at p. 458). 

14. Other pre-November 1998 publications further evidence the utility of the 
cDNA microarrays first developed at Stanford in a wide range of gene expression monitoring 
applications (see, e.g., the Shalon and the Heller articles at Tabs F and G). By no later than the March 
1997 publication of the Heller article, these publications showed that employees of Synteni (i.e., James 
Gilmore and myself) had used the cDNA microarrays in specific gene expression monitoring 
applications (see Tab G). 

The Heller article states that the results reported therein "successfully demonstrate the 
use of the cDNA microarray system as a general approach for dissecting human diseases" (Tab G at 
p. 2150). Among other things, the Heller article describes the investigation of "1000 human genes that 
were randomly selected from a peripheral human blood cell library" and "[t]heir differential and 
quantitative expression analysis in cells of the joint tissue. . . to demonstrate the utility of the microarray 
method to analyze complex diseases by their pattern of gene expression" (see Tab G at pp. 2150 et 
seq.). 

Much of the work reported on in the Heller article was done in 1996. That article, 
therefore, evidences how persons skilled in the art were readily able, well prior to November 5, 1998, 
to make and use cDNA microarrays to achieve highly useful results. For example, as reported in the 
Heller article, a cDNA microarray that was used in some of the highly successful work reported on 
therein was made from 1,000 genes randomly selected from a human blood cell library. 
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15. A person skilled in the art on November 5, 1998, who read the Tang '248 
application, would understand that application to disclose the SEQ ID NO:l-encoding polynucleotides, 
for example, SEQ ID N0:2, to be highly useful as probes for the expression of that specific 
polynucleotide in cDNA microarrays of the type first developed at Stanford. For example, the 
specification of the Tang '248 application would have led a person skilled in the art in November 1998 
who was using gene expression monitoring in connection with working on developing new drugs for the 
treatment of heart and skeletal muscle disorders, developmental disorders, and cell proliferative 
disorders, including cancer to conclude that a cDNA microarray that contained the SEQ ID NO:l- 
encoding polynucleotides would be a highly useful tool and to request specifically that any cDNA 
microarray that was being used for such purposes contain the SEQ ID NO: 1 -encoding polynucleotides. 
Persons skilled in the art would appreciate that cDNA microarrays that contained the SEQ ID NO:l- 
encoding polynucleotides would be a more useful tool than cDNA microarrays that did not contain the 
polynucleotides in connection with conducting gene expression monitoring studies on proposed (or 
actual) drugs for treating heart and skeletal muscle disorders, developmental disorders, and cell 
proliferative disorders, including cancer for such purposes as evaluating their efficacy and toxicity. 

I discuss in more detail in items (a)-(g) below a number of reasons why a person skilled 
in the art, who read the Tang '248 specification in November 1998, would have concluded based on 
that specification and the state of the art at that time, that the SEQ ID NO:l-encoding polynucleotides 
would be a highly useful tool for inclusion in cDNA microarrays for evaluating the efficacy and toxicity 
of proposed drugs for treating heart and skeletal muscle disorders, developmental disorders, and cell 
proliferative disorders, including cancer, as well as for other evaluations: 

(a) The Tang '248 application teaches the SEQ ID NO:l-encoding 
polynucleotides to be useful as probes in cDNA microarrays of the type first developed at Stanford. It 
also teaches that such cDNA microarrays are useful in a number of gene expression monitoring 
applications, including "developing and monitoring the activity of therapeutic agents [i.e., drugs]" (see 
paragraph 10, supra). 

(b) By November 1998, the Stanford-developed cDNA microarray technology 
was a well known and widely accepted tool for use in a wide range of gene expression monitoring 
applications. This is evidenced, for example, by numerous publications describing the use of that 
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cDNA technology in gene expression monitoring applications and the fact that, for over a year, the 
technology had provided the basis for the operations of an up-and-running company (Synteni), with 
employees, that was created for the purpose of developing and commercially exploiting that technology 
(see paragraphs 2, 8 and 10-14, supra). The fact that Incyte agreed to purchase Synteni in late 1997 
for an amount reported to be at least about $80 million only serves to underscore the substantial 
practical and commercial significance, in 1997, of the cDNA microarray technology first developed at 
Stanford (see paragraph 2, supra), 

(c) The pre-November 1998 publications regarding the cDNA microarray 
technology first developed at Stanford that I discuss in this Declaration repeatedly confirm that, 
consistent with the teachings in the Tang '248 application, cDNA microarrays are highly useful tools for 
conducting gene expression monitoring applications with respect to the development of drugs and the 
monitoring of their activity. Among other things, those pre-November 1998 publications confirmed that 
cDNA microarrays (i) were useful for monitoring gene expression responses to different drugs (see 
paragraph 12, supra), (ii) were useful in analyzing gene expression patterns in human cancer, with 
increasing the number of genes on the cDNA microarray enhancing the ability of the cDNA microarray 
to provide useful information (see paragraph 13, supra), and (iii) were a valuable tool for use as part of 
a "general approach for dissecting human diseases" and for "analyz[ing] complex diseases by their 
pattern of gene expression" (see paragraph 14, supra). 

(d) Based on my own extensive work for a company whose business was the 
development and commercial exploitation of cDNA microarray technology for more than two years 
prior to the November 1998 filing date of the Tang '248 application, I have first-hand knowledge 
concerning the state of the art with respect to making and using cDNA microarrays as of November 5, 
1998 (see paragraphs 2 and 14, supra). Persons skilled in the art as of that date would have (a) 
concluded that the Tang '248 application disclosed cDNA microarrays containing the SEQ ID NO:l- 
encoding polynucleotides to be useful, and (b) readily been able to make and use such microarrays with 
useful results. 

(e) The Tang '248 specification contains a number of teachings that would lead 
persons skilled in the art on November 5, 1998 to conclude that a cDNA microarray that contained the 
SEQ ID NO:l-encoding polynucleotides would be a more useful tool for gene expression monitoring 
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applications relating to drugs for treating heart and skeletal muscle disorders, developmental disorders, 
and cell proliferative disorders, including cancer than a cDNA microarray that did not contain the SEQ 
ID NO:l-encoding polynucleotides. Among other things, the Tang '248 specification teaches that the 
identity of the SEQ ID NO: 2 polynucleotide was determined from a colon tumor tissue cDNA library 
(COLNTUT03) (Tang '248 application at pp. 11 and 33). Moreover, northern analysis of SEQ ID 
N0:2 shows its expression predominantly in cDNA libraries associated with hematopoietic/immune 
system, gastrointestinal, musculoskeletal, and reproductive tissues, and in tissues associated with cancer 
(Specification at page 12, lines 4-9). (See paragraph 9, supra). 

Moreover, the Tang '248 specification teaches that the MHCH protein having the amino acid 
sequence of SEQ ID NO: 1 shares homology with known functional proteins. MHCH is a member of 
the human receptor protein family. In particular, SEQ ID N0:1 shares homology with the sequence of 
C. elegans myosin (gl279777) and K annuus unconventional myosin (g2444174). (Tang '248 
appHcation, at p. 11). 

(f) Persons skilled in the art on November 5, 1998 would have appreciated (i) 
that the gene expression monitoring results obtained using a cDNA microarray containing a probe to a 
sequence selected from the group consisting of SEQ ID NO:l-encoding polynucleotides would vary, 
depending on the particular drug being evaluated, and (ii) that such varying results would occur both 
with respect to the results obtained from the probe described in (i) and from the cDNA microarray as a 
whole (including all its other individual probes). These kinds of varying results, depending on the 
identity of the drug being tested, in no way detracts from my conclusion that persons skilled in the art on 
November 5, 1998, having read the Tang '248 specification, would specifically request that any cDNA 
microarray that was being used for conducting gene expression monitoring studies on drugs for treating 
heart and skeletal muscle disorders, developmental disorders, and cell proliferative disorders, including 
cancer (e.g., a toxicology study or any efficacy study of the type that typically takes place in connection 
with the development of a drug) contain any one of the SEQ ID NO: 1 -encoding polynucleotides as a 
probe. Persons skilled in the art on November 5, 1998 would have wanted their cDNA microarray to 
have a probe as described in (i) because a microarray that contained such a probe (as compared to 
one that did not) would provide more useful results in the kind of gene expression monitoring studies 
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using cDNA microarrays that persons skilled in the art have been doing since well prior to November 
5, 1998. 

The foregoing is not intended to be an all-inclusive explanation of all my reasons for 
reaching the conclusions stated in this paragraph 15, and in paragraph 6, supra. In my view, however, 
it provides more than sufficient reasons to justify my conclusions stated in paragraph 6 of this 
Declaration regarding the Tang *248 application disclosing to persons skilled in the art at the time of its 
filing substantial, specific and credible real-world utilities for the SEQ ED NO:l-encoding 
polynucleotides. 

16. Also pertinent to my considerations underlying this Declaration is the fact that the 
Tang '248 disclosure regarding the uses of the SEQ ID NO:2 polynucleotide for gene expression 
monitoring applications is not limited to the use of that polynucleotide as a probe in microarrays. For 
one thing, the Tang '248 disclosure regarding the hybridization technique used in gene expression 
monitoring applications is broad (Tang '248 application at, e.g., p. 3, lines 4-9). 

In addition, the Tang '248 specification repeatedly teaches that the polynucleotides 
described therein (including the polynucleotide of SEQ ID NO:2) may desirably be used as probes in 
any of a number of long established "standard" non-microarray techniques, such as Northern analysis, 
for conducting gene expression monitoring studies. See, e.g.: 

(a) Tang '248 application at p. 7, lines 11-13 ("[N]orthem analysis is indicative 
of the presence of nucleic acids encoding MHCH in a sample, and thereby correlates with expression 
of the transcript from the polynucleotide encoding MHCH"); 

(b) Tang '248 application at p. 30, lines 24-27 ("The polynucleotide sequences 
encoding MHCH may be used in Southern or northern analysis, dot blot, or other membrane-based 
technologies; in PCR technologies; in dipstick, pin, and multiformat ELISA-like assays; and in 
microarrays utilizing fluids or tissues from patients to detect altered MHCH expression. Such 
qualitative or quantitative methods are well known in the art"); 

(c) Tang '248 application at p. 31, lines 1-10 ("In order to provide a basis for 
the diagnosis of a disorder associated with expression of MHCH, a normal or standard profile for 
expression is established. This may be accomplished by combining body fluids or cell extracts taken 
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from normal subjects, either animal or human, with a sequence, or a fragment thereof, encoding 
MHCH, under conditions suitable for hybridization or amplification. Standard hybridization may be 
quantified by comparing the values obtained from normal subjects with values from an experiment in 
which a known amount of a substantially purified polynucleotide is used. Standard values obtained in 
this manner may be compared with values obtained from samples from patients who are symptomatic 
for a disorder. Deviation from standard values is used to establish the presence of a disorder"); and 

(d) Tang '248 application at p. 35, lines 14-17 ("Northern analysis is a 
laboratory technique used to detect the presence of a transcript of a gene and involves the hybridization of 
a labeled nucleotide sequence to a membrane on which RNAs from a particular cell type or tissue have 
been bound. (See, e.g., Sambrook, supra , ch. 7; Ausubel, 1995, supra , ch. 4 and 16.)" ). 

The "Sambrook et al." reference cited in item (d) immediately above is a reference that 
was well known to persons skilled in the art in November 1998. A copy of pages from that reference 
manual, which was published in 1989, is annexed to this Declaration at Tab H. The attached pages 
from the Sambrook manual provide an overview of northern analysis and other membrane-based 
technologies for conducting gene expression monitoring studies that were known and used by persons 
skilled in the art for many years prior to the November 5, 1998 filing date of the Tang '248 application. 

A person skilled in the art on November 5, 1998, who read the Tang '248 
specification, would have routinely and readily appreciated that the SEQ JD NO:l-encoding 
polynucleotides disclosed therein would be useful as a probe to conduct gene expression monitoring 
analyses using northern analysis or any of the other traditional membrane-based gene expression 
monitoring techniques that were known and in common use many years prior to the filing of the Tang 
'248 application. For example, a person skilled in the art in November 1998 would have routinely and 
readily appreciated that the SEQ ED NO:l-encoding polynucleotides would be a useful tool in 
conducting gene expression analyses, using the northern analysis technique, in furtherance of (a) the 
development of drugs for the treatment of heart and skeletal muscle disorders, developmental 
disorders, and cell proliferative disorders, including cancer, and (b) analyses of the efficacy and toxicity 
of such drugs. 
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Field of the InveptloB 

5 This invention relates to a method and apparatus 

for fabricating microiarrays of biological samples for 
large scale screening assays, such as arrays of DNA 
samples to be used in DNA hybridization assays for 
genetic research and diagnostic applications • 

10 

Ref erencftg 

Abouzied, et al.. Journal of AOAC International 
27(2) :495-500 (1994). 

Bohlander, et al., GBnomics 13:1322-1324 (1992). 
15 Drmanac, et al.. Science 260 :1649-1652 (1993). 

Fodor, et al.. Science 251 :767-773 (1991). 

Khrapko, et al., DNA Sequence 1:375-388 (1991). 

Kuriyama , et al . , An ISFET Biosensor, Applied Biosensors 
(Donald Wise, Ed.)f Buttervorths , pp, 93-114 (1989). 
20 Lehrach, et al.. Hybridization Fingerprinting in Genome 

Mapping and Seqitencing. Genome Analysis, Vol 1 (Davies and 
Tilgham, Eds.), Cold Spring Harbor Press, pp. 39-81 
(1990). 

Maniatis, t al.. Molecular cloning, a Laboratory 
25 Manual . Cold Spring Harbor Press (1989) . 

Nelson, et al.. Nature Genetics 4:11-18 (1993). 



wo 95/35505 



PCT/US95/07659 



2 

Pirrung, et al., U.S. Patent No. 5,143,854 (1992). 
Riles, et al., Genetics 134.:81-150 (1993). 
Schena, M. et al., Proc. Nat. Acad. Sci . USA 
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Bac)carouBd of the Invention 

A variety of methods are currently available for 
making arrays of biological macromolecules , such as 

10 arrays of nucleic acid molecules or proteins. One 
method for making ordered arrays of DNA on a porous 
membrane is a "dot blot" approach. In this method, a 
vacuiam manifold transfers a plurality, e.g., 9S, 
aqueous samples of DNA from 3 millimeter diameter wells 

15 to a porous membrane. A common variant of this 

procedure is a "slot-blot" method in which the wells 
have highly-elongated oval shapes. 

The DNA is immobilized on the porous membrane by 
baking the membreme or exposing it to UV radiation. 

20 This is a manual procedure practical for making one 

array at a time and usually limited to 96 samples per 
array. "Dot-blot" procedures are therefore inadequate 
for applications in which many thousand samples must be 
determined. 

25 A more efficient technique employed for meOcing 

ordered arrays of genomic fragments uses an array of 
pins dipped into the wells, e.g., the 96 wells of a 
microtitire plate, for transferring an array of samples 
to a substrate, such as a porous membrane. One array 

30 includes pins that are designed to spot a membrane in a 
staggered fashion, for creating an array of 9216 spots 
in a 22 X 22 cm area (Lehrach, et al., 1990). A 
limitation with this approach is that the volume of DNA 
spotted in each pixel of each array is highly variable. 
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In addition, the number of arrays that can be made with 
each dipping is usually quite small. 

An alternate method of creating ordered arrays of 
nucleic acid sequences is described by Pirrung, et al. 
5 (1992), and also by Fodor, et al. (1991). The method 
involves synthesizing different nucleic acid sequences 
at different discrete regions of a support. This 
method employs elaborate synthetic schemes, and is 
generally limited to relatively short nucleic acid 

10 sample, e.g., less than 20 bases. A related method has 
been described by Southern/ et al. (1992). 

Khrapko, et al. (1991) describes a method of 
making an oligonucleotide matrix by spotting DNA onto a 
thin layer of polyacrylamide. The spotting is done 

15 manually with a micropipette. 

None of the methods or devices described in the 
prior art are designed for mass fabrication o^ 
microarrays characterized by (i) a large number of 
micro-sized assay regions separated by a distance of 

20 50-200 microns or less, and (ii) a well-defined amount, 
typically in the picomble range, of analyte associated 
with each region of the array. 

Furthermore, cxirrent technology is directed at 
performing such assays one at a time to a single array 

25 of DNA molecules. For exeunple, the most common method 
for performing DNA hybridizations to arrays spotted 
onto porous membrane involves sealing the membrane in a 
plastic bag (Maniatas, et al., 1989) or a rotating 
glass cylinder (Robbins Scientific) with the labeled 

30 hybridization probe inside the sealed chamber. For 
arrays made on non-porous surfaces, such as a 
microscope slide, each array is incubated with the 
labeled hybridization probe s aled under a coverslip. 
These techniques require a separate sealed chamber for 
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each array which makes the screening and handling of 
many such arrays inconvenient and time intensive. 

Abouzied, et al . (1994) describes a method of 
printing horizontal lines of antibodies on a 
5 nitrocellulose membrane and separating regions of the 
membrane with vertical stripes of a hydrophobic 
material. Each vertical stripe is then reacted with a 
different antigen and the reaction between the 
immobilized antibody and an antigen is detected using a 

10 standard ELISA colorimetric technique. Abouzied's 
technique makes it possible to screen many one- 
dimensional arrays simultaneously on a single sheet of 
nitrocellulose. Abouzied makes the nitrocellulose 
somewhat hydrophobic using a line drawn with PAP Pen 

15 (Research Products International) . However Abouzied 
does not describe a technology that is capable of 
completely sealing the pores of the nitrocellulose. The 
pores of the nitrocellulose 2ire still physically open 
and so the assay reagents can leak through the 

20 hydrophobic bsurrier during extended high temperature 
incubations or in the presence of detergents which 
makes the Abouzied technique unacceptable for DNA 
hybridization assays. 

Porous membranes with printed patterns of 

25 hydrophilic/hydrophobic regions exist for applications 
such as ordered arrays of bacteria colonies. QA Life 
Sciences (San Diego CA) makes such a membrane with a 
grid pattern printed on it. However, this membrane has 
the same disadvantage as the Abouzied technique since 

3D reagents can still flow between the gridded arrays 
making them unusable for separate DNA hybridization 
assays. 

Pall Corporation make a 96-well plate with a 
porous filter heat sealed to the bottom of the plate. 
35 These plates are capable of containing different 



wo 95/35505 



PCT/DS95/07659 



5 

reagents in each well without cross-contamination. 
However, each well is intended to hold only one teirget 
element whereas the invention described here makes a 
microarray of many biomolecules in each subdivided 
5 region of the solid support. Furthermore, the 96 well 
plates are at least 1 cm thick and prevent the use of 
the device for many colorimetric, fluorescent and 
radioactive detection formats which require that the 
membrane lie flat against the detection surface. The 

10 invention described here requires no further processing 
after the assay step since the barriers elements are 
shallow and do not interfere with the detection step 
thereby greatly increasing convenience. 

Hyseq Corporation has described a method of mciking 

15 an "array of arrays" on a non-porous solid support for 
use with their sequencing by hybridization technique. 
The method described by Hyseq involves modifying the 
chemistry of the solid support material to form a 
hydrophobic grid pattern where each subdivided region 

20 contains a microarray of biomolecules. Hyseq's flat 
hydrophobic pattern does not make use of physical 
blocking as an additional means of preventing cross 
contamination. 

25 siimw*^ of the Invention 

The invention includes, in one aspect, a method of 
forming a microarray of analyte-assay regions on a 
solid support, where each region in the array has a 
known amount of a selected, analyte-specif ic reagent. 

3 0 The method involves first loading a solution of a 
selected analyte-specif ic reagent in a reagent- 
dispensing device having an elongate capillary channel 
(i) formed by spaced-apart , coextensive elongate 
members, (ii) adapted to hold a quantity of the reagent 

35 solution and (iii) having a tip region at which aqueous 
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solution in the channel forms a meniscus. The channel 
is preferably formed by a pair of spaced-apart tapered 
elements. 

The tip of the dispensing device is tapped against 
5 a solid support at a defined position on the support 

surface with an impulse effective to break the meniscus 
in the capillary channel deposit a selected volume of 
solution on the sxirface, preferably a selected volume 
in the range 0.01 to 100 nl. The two steps are 

10 repeated until the desired axray is formed. 

The method may be practiced in forming a plurality 
of such arrays, where the solution-depositing step is 
are applied to a selected position on each of a 
plurality of solid supports at each repeat cycle. 

15 The dispensing device may be loaded with a new 

solution, by the steps of (i) dipping the capillary 
channel of the device in a wash solution, (ii) removing 
wash solution drawn into the capillary channel, and 
(iii) dipping the capillary channel into the new 

20 reagent solution. 

Also included in the invention is an automated 
apparatus for forming a microarray of analyte-assay 
regions on a plxirality of solid supports, where each 
region in the array has a known amount of a selected, 

25 analyte-specif ic reagent. The apparatus has a holder 
for holding, at known positions, a plurality of planar 
supports, and a reagent dispensing device of the type 
described above. 

The apparatus further includes positioning 

30 structure for positioning the dispensing device at a 
selected array position with respect to a support in 
said holder, and dispensing structure for moving the 
dispensing device into tapping engagement against a 
support with a selected impulse effective to deposit a 
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selected volume on the support^ e.g., a selected volume 
in the volume range 0.01 to 100 nl. 

The positioning and dispensing structures are 
controlled by a control unit in the apparatus . The 
5 unit operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent, and 
(iii) dispense the reagent at a defined array position 
10 on each of the supports on said holder. The unit may 
fxxrther operate, at the end of a dispensing cycle, to 
wash the dispensing device by (i) placing the 
dispensing device at a washing station, (ii) moving the 
capillary channel in the device into a wash fluid, to 
15 load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 
device with a fresh selected reagent. 

The dispensing device in the apparatus may be one 
of a plurality of such devices which are carried on the 
20 arm for dispensing different analyte assay reagents at 
selected spaced eurray positions. 

In another aspect, the invention includes a 
substrate with a surface having a microarray of at 
least 10^ distinct polynucleotide or polypeptide 
25 biopolymers in a surface area of less than about 1 cm^. 
Each distinct biopolymer (i) is disposed at a separate, 
defined position in said array, (ii) has a length of at 
least 50 sxxbunits, and (iii) is present in a defined 
amount between about O.l femtomoles and 100 nanomoles. 
30 In one embodiment, the surface is glass slide 

surface coated with a polycationic polymer, such as 
poly lysine, and the biopolymers are polynucleotides. 
In another embodiment, the substrate has a water- 
impermeable backing, a water-permeable film formed on 
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the backing, and a grid formed on the film. The grid 
is composed of intersecting water- impervious grid 
elements extending from said backing to positions 
raised above the surface of said film, and partitions 
5 the film into a plurality of water-impervious cells. A 
biopolymer array is formed within each well. 

More generally, there is provided a substrate for 
use in detecting binding of labeled polynucleotides to 
one or more of a plurality different-sequence, 

10 immobilized polynucleotides. The substrate includes, 
in one aspect, a glass support, a coating of a 
polycationic polymer, such as poly lysine, on said 
surface of the support, and an array of distinct 
polynucleotides electrostatically bound non-covalently 

15 to said coating, where each distinct biopolymer is 

disposed at a separate, defined position in a surface 
array of polynucleotides. 

In another aspect, the substrate includes a water- 
impermeed>le backing, a water-permeable film formed on 

20 the backing, and a grid formed on the film, where the 
grid is composed of intersecting water-impeirvious grid 
elements extending from the backing to positions raised 
above the siirface of the film, forming a plurality of 
cells. A biopolymer array is formed within each cell. 

25 Also forming part of the invention is a method of 

detecting differential expression of each of a 
plurality of genes in a first cell type, with respect 
to expression of the ssune genes in a second cell type. 
In practicing the method, there is first produced 

30 fluorescent-labeled cDNA's from mRNA's isolated from 
the two cells types, where the cDNA'S from the first 
and second cells are labeled with first and second 
different fluorescent reporters. 

A mixture of the labeled cDNA's from the two cell 

35 types is added to an array of polynucleotides 
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representing a plurality of known genes derived from 
the two cell types, under conditions that result in 
hybridization of the cDNA's to complementary-sequence 
polynucleotides in the array. The array is then 
5 examined by fluorescence under fluorescence excitation 
conditions in which (i) polynucleotides in the array 
that are hybridized predominantly to cDNA's derived 
from one of the first and second cell types give a 
distinct first or second fluorescence emission color, 

10 respectively, and (ii) polynucleotides in the array 

that are hybridized to substantially equal numbers of 
cDNA's derived from the first and second cell types 
give a distinct combined fluorescence emission color, 
respectively. The relative expression of known genes 

15 in the two cell types can then be determined by the 
observed fluorescence emission color of each spot. 

These and other objects and features of the 
invention will become more fully apparent when the 
following detailed description of the invention is read 

20 in conjunction with the accompanying figures. 

Brief Description of the Dravinga 
Pig* 1 is a side view of a reagent-dispensing 
device having a open-capillary dispensing head 
25 constructed for use in one embodiment of the invention; 

Figs. 2A-2C illustrate steps in the delivery of a 
fixed-volume bead on a hydrophobic surface employing 
the dispensing head from Fig, l, in accordance with one 
embodiment of the method of the invention; 
30 Fig. 3 shows a portion of a two-dimensional array 

of analyte-assay regions constructed according to the 
method of the invention; 

Fig. 4 is a planar view showing components of an 
automated apparatus for forming arrays in accordance 
35 with the invention. 
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Fig. 5 shows a fluorescent image of an actual 20 x 
20 array of 400 f luorescently-labeled DNA samples 
immobilized on a poly-l-lysine coated slide, where the 
total area covered by the 4 00 element array is 16 
5 square millimeters; 

Fig. 6 is a fluorescent image ofai.8cmxi.8cm 
microarray containing Isunbda clones with yeast inserts, 
the fluorescent signal arising from the hybridization 
to the array with approximately half the yeast genome 
10 labeled with a green f luorophore and the other half 
with a red f luorophore; 

Fig. 7 shows the translation of the hybridization 
image of Fig. 6 into a karyotype of the yeast genome, 
where the elements of Fig. -6 microarray contain yeast 
15 DNA sequences that have been previously physically 
mapped in the yeast genome; 

Fig. 8 show a fluorescent image ofaC.5cmxo.5 
cm microEurray of 24 cDNA clones, where the microarray 
was hybridized simultaneously with total cDNA from wild 
20 type Arabidopsis plant labeled with a green f luorophore 
and total cDNA from a transgenic Arabidopsis plant 
labeled with a red f luorophore, and the arrow points to 
the cDNA clone representing the gene introduced into 
the transgenic Arabidopsis plant; 
25 Fig* 9 shows a plan view of substrate having an 

array of cells formed by barrier elements in the form 
of a grid; 

Fig. 10 shows an enlarged plan view of one of the 
cells in the substrate in Fig. 9, showing an array of 
30 polynucleotide regions in the cell; 

Fig. 11 is an enlarged sectional view of the 
substrate in Fig. 9, tsOcen along a section line in that 
figure; and 

Fig. 12 is a scanned image of a 3 cm x 3 cm 
35 nitrocellulose solid support containing four identical 
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arrays of M13 clones in each of four quadrants, where 
each quadrant was hybridized simultaneously to a 
different oligonucleotide using an open face 
hybridization method. 

5 

Detailed Description of the iDvention 

I- Pefinitipns 

Unless indicated otherwise, the terms defined 
below have the following meanings: 

10 "Ligand" refers to one member of a ligand/anti- 

ligand binding pair. The ligand may be, for example, 
one of the nucleic acid strands in a complementary, 
hybridized nucleic acid duplex binding pair; an 
effector molecule in an effector/receptor binding pair; 

15 or an antigen in an antigen/ antibody or 
antigen /antibody fragment binding pair. 

"Antiligand" refers to the opposite member of a 
ligand/anti-ligand binding pair. The antiligand may be 
the other of the nucleic acid strands in a 

20 complementary, hybridized nucleic acid duplex binding 
pair; the receptor molecule in an effector /receptor 
binding pair; or an zmtibody or antibody fragment 
molecule in antigen/ amtibody or antigen/ antibody 
fragment binding pair, respectively. 

25 "Analyte" or "analyte molecule" refers to a 

molecule, typically a macromolecule, such as a 
polynucleotide or polypeptide, whose presence, amount, 
and/ or identity are to be determined. The analyte is 
one member of a ligand/anti-ligand pair. 

30 "Analyte-specif ic assay reagent" refers to a 

molecule effective to bind specifically to an analyte 
molecule. The reagent is the opposite member of a 
ligand/anti-ligand binding pair. 

An "array of regions on a solid support" is a 

35 linear or two-dimensional array of preferably discrete 



Wp 95/35505 



PCT/DS95/07659 



12 

regions, each having a finite area, formed on the 
surface of a solid support. 

A "microarray" is an array of regions having a 
density of discrete regions of at least about 100/cin^, 
5 and preferably at least about lOOO/cm^ The regions in 
a microarray have typical dimensions, e.g., diameters 
in the range of between about 10-250 /im, and are 
separated from other regions in the array by about the 
same distance. 

A support sxirface is "hydrophobic" if a aqueous- 
medium droplet applied to the surface does not spread 
out substantially beyond the area size of the applied 
droplet. That is, the surface acts to prevent 
spreading of the droplet applied to the surface by 
15 hydrophobic interaction with the droplet. 

A "meniscus" means a concave or convex surface 
that forms on the bottom of a liquid in a channel as a 
result of the surface tension of the liquid. 

"Distinct biopolymers" , as applied to the 
20 biopolymers forming a microarray, means an array member 
which is distinct from other array members on the basis 
of a different biopolymer sequence, and/or different 
concentrations of the S2une or distinct biopolymers, 
and/or different mixtxires of distinct or different- 
25 concentration biopolymers. Thus an array of "distinct 
polynucleotides" means an array containing, as its 
members, (i) distinct polynucleotides, which may have a 
defined amount in each member, (ii) different, graded 
concentrations of given-sequence polynucleotides, 
and/or (iii) different-composition mixtures of two or 
more distinct polynucleotides. 

"Cell type" means a cell from a given source, 
e.g., a tissue, or organ, or a cell in a given state of 
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differentiation, or a cell associated with a given 
pathology or genetic makeup. 

Method of Mieroarr av Formal-^ 
This section describes a method of forming a 
microarray of analyte-assay regions on a solid support 
or substrate, where each region in the array has a 
Icnown amount of a selected, analyte-specif ic reagent. 

Fig. 1 illustrates, in a partially schematic view, 
a reagent-dispensing device 10 useful in practicing the 
method. The device generally includes a reagent 
dispenser 12 having an elongate open capillary . channel 
14 adapted to hold a quantity of the reagent solution, 
such as indicated at 16, as will be described below. 
The capillary channel is formed by a pair of spaced- 
apart, coextensive, elongate members I2a, I2b which are 
tapered toward one another and converge at a tip or tip 
region 18 at the lower end of the channel. More 
generally, the open channel is formed by at least two 
elongate, spaced-apart members adapted to hold a 
quantity of reagent solutions and having a tip region 
at which aqueous solution in the channel forms a 
meniscus, such as the concave meniscus illustrated at 
20 in Fig. 2A. The advantages of the open channel 
construction of the dispenser are discussed below. 

With continued reference to Fig. 1, the dispenser 
device also includes structure for moving the dispenser 
rapidly toward and away from a support surface, for 
effecting deposition of a known amount of solution in 
the dispenser on a support, as will be described below 
with reference to Figs. 2A-2C. in the embodiment 
shown, this structure includes a solenoid 22 which is 
activatable to draw a solenoid piston 24 rapidly 
downwardly, then release the piston, e.g., under spring 
35 bias, to a normal, raised position, as shown. The 
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dispenser is carried on the piston by a connecting 
member 26, as shovn. The just-described moving 
structure is also referred to herein as dispensing 
means for moving the dispenser into engagement with a 
5 solid support, for dispensing a known volume of fluid 
on the support. 

The dispensing device just described is carried on 
an arm 28 that may be moved either linearly or in an x- 
y plane to position the dispenser at a selected 

10 deposition position, as will be described. 

Figs, 2A-2C illustrate the method of depositing a 
known sunoxint of reagent solution in the just-described 
dispenser on the sxirface of a solid support, such as 
the support indicated at 30. The support is a polymer, 

15 glass, or other solid-material support having a surface 
indicated at 31. 

In one general embodiment, the surface is a 
relatively hydrophilic. I.e., wettable surface, such as 
a surface having native, bound or covalently attached 

20 charged groups. On such surface described below is a 
glass surface having an absorbed layer of a 
polycationic polymer/ such as poly-l-lysine. 

In another embodiment, the surface has or is 
formed to have a relatively hydrophobic chaxacter, 

25 i.e., one that causes aqueous medium deposited on the 
surface to bead. A variety of known hydrophobic 
polymers, such as polystyrene, polypropylene, or 
polyethylene have desired hydrophobic properties, as do 
glass and a variety of lubricant or other hydrophobic 

30 films that may be applied to the support surface. 

Initially, the dispenser is loaded with a selected 
analyte-specif ic reagent solution, such as by dipping 
the dispenser tip, after washing, into a solution of 
the reagent, and allowing filling by capillary flow 

35 into the dispenser channel. The dispenser is now moved 
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to a selected position with respect to a support 
surface, placing the dispenser tip directly above the 
support-surface position at which the reagent is to be 
deposited. This movement takes place with the 
5 dispenser tip in its raised position, as seen in Fig. 
2A, where the tip is typically at least several 1-5 am 
above the surface of the substrate. 

With the dispenser so positioned, solenoid 22 is 
now activated to cause the dispenser tip to move 

10 rapidly toward and away from the substrate surface, 
making momentary contact with the surface, in effect, 
tapping the tip of the dispenser against the support 
surface. The tapping movement of the tip against the 
surface acts to break the liquid meniscus in the tip 

15 channel, bringing the liquid in the tip into contact 
with the support surface. This, in turn, produces a 
flowing of the liquid into the capillary space between 
the tip and the surface, acting to draw liquid out of 
the dispenser channel, as seen in Fig. 2B. 

20 Fig. 2C shows flow of fluid from the tip onto the 

support surface, which in this case is a hydrophobic 
surface. The figure illustrates that liquid continues 
to flow from the dispenser onto the support surface 
until it forms a liquid bead 32. At a given bead size, 

25 i.e., volume, the tendency of liquid to flow onto ^he 
surface will be balanced by the hydrophobic surface 
interaction of the bead with the support surface, which 
acts to limit the total bead area on the surface, and 
by the surface tension of the droplet, which tends 

3 0 toward a given bead curvature. At this point, a given 
bead volume will have formed, and continued contact of 
the dispenser tip with the bead, as the dispenser tip 
is being withdrawn, will have little or no effect on 
bead volume. 



wo 95/35505 



PCTAJS95/07659 



16 

For liquid-dispensing on a more hydrophilic 
surface, the liquid will have less of a tendency to 
bead, and the dispensed volume will be more sensitive 
to the total dwell time of the dispenser tip in the 
5 immediate vicinity of the support surface, e.g., the 
positions illustrated in Figs. 2B and 2C. 

The desired deposition volume, i.e., bead volume, 
formed by this method is preferably in the range 2 pi 
(picoliters) to 2 nl (nanoliters) , although volumes as 

10 high as 100 nl or more may be dispensed. It will be 
appreciated that the selected dispensed voltame will 
depend on (i) the "footprint" of the dispenser tip, 
i.e., the size of the area spanned by the tip, (ii) the 
hydrophobicity of the support surface, and (iii) the 

15 time of contact with and rate of withdrawal of the tip 
from the support surface. In addition, bead size may 
be reduced by increasing the viscosity of the medium, 
effectively reducing the flow time of liquid from the 
dispenser onto the support surface. The drop size may 

20 be further constrained by depositing the drop in a 
hydrophilic region siarrounded by a hydrophobic grid 
pattern on the support surface. 

In a typical embodiment, the dispenser tip is 
tapped rapidly against the support surface, with a 

25 total residence time in contact with the support of 
less than about 1 msec, and a rate of upward travel 
from the surface of about 10 cm/sec. 

Assuming that the bead that forms on contact with 
the surface is a hemispherical bead, with a diameter 

30 approximately equal to the width of the dispenser tip, 
as shown in Fig. 2C, the volume of the bead formed in 
relation to dispenser tip width (d) is given in Table 1 
below. As seen, the volume of the bead ranges between 
2 pi to 2 nl as the width size is increased from about 

35 20 to 200 
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Tabla 1 





Volume (nl) ] 


20 mn 


2 X 10-3 ~| 


50 ^2D 


3.1 X 10-2 1 


100 /xm 


2.5 X 10-' 1 


200 





35 



At a given tip size, bead volume can be reduced in 
a controlled fashion by increasing surface 
hydrophobicity, reducing time of contact of the tip 
with the siirface, increasing rate of movement of the 
tip away from the surface, and/or increasing the 
viscosity of the medium. Once these parameters are 
fixed, a selected deposition volume in the desired pi 
to nl range can be achieved in a repeatable fashion. 

After depositing a bead at one selected location 
on a support, the tip is typically moved to a 
corresponding position on a second support, a droplet 
is deposited at that position, and this process is 
repeated until a liquid droplet of the reagent has been 
deposited at a selected position on each of a plurality 
of supports. 

The tip is then washed to remove the reagent 
liquid, filled with another reagent liquid and this 
reagent is now deposited at each another array position 
on each of the supports. In one embodiment, the tip is 
washed and refilled by the steps of (i) dipping the 
capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

From the foregoing, it will be appreciated that 
the tweezers-like, open-capillary dispenser tip 
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provides the advantages that (i) the open channel of 
the tip facilitates rapid, efficient washing and drying 
before reloading the tip with a new reagent, (ii) 
passive capillary action can load the sample directly 
5 from a standard microwell plate while retaining 

sufficient sample in the open capillary reservoir for 
the printing of numerous arrays, (iii) open capillaries 
are less prone to clogging than closed capillaries, and 
(iv) open capillaries do not require a perfectly faced 

10 bottom surface for fluid delivery. 

A portion of a microarray 36 formed on the surface 
38 of a solid support 4 0 in accordance with the method 
just described is shown in Fig. 3. The array is formed 
of a plurality of analyte-specif ic reagent regions, 

15 such as regions 42, where each region may include a 
different analyte-specif ic reagent. As indicated 
above, the diameter of each region is preferably 
between Eibout 20-200 /im. The spacing between each 
region and its closest (non-diagonal) neighbor, 

20 measured from center-to-center (indicated at 44) , is 

preferably in the range of about 20-400 ^llSL. Thus, for 
exeuople, an array having a center-to-center spacing of 
about 250 Mm contains about 40 regions/cm or 1,600 
regions/cm^. After formation of the array, the support 

25 is treated to evaporate the liquid of the droplet 

forming each region, to leave a desired array of dried, 
relatively flat regions. This drying may be done by 
heating or under vacuum. 

In some cases, it is desired to first rehydrate 

30 the droplets containing the analyte reagents to allow 
for more time for adsorption to the solid support. It 
is also possible to spot out the analyte reagents in a 
humid environment so that droplets do not dry until the 
arraying operation is complete • 
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III- Automated Apparatus for Fo rraina Arr^yc : 

In another aspect, the invention includes an 
automated apparatus for forming an array of analyte- 
assay regions on a solid support, where each region in 
5 the array has a known amount of a selected, analyte- 
specific reagent. 

The apparatus is shown in planax, and partially 
schematic view in Fig^ 4/ A dispenser device 72 in the 
apparatus has the basic construction described above 

10 with respect to Fig. 1, and includes a dispenser 74 

having an open-capillary channel terminating at a tip, 
substantially as shown in Figs. 1 and 2A-2C. 

The dispenser is mounted in the device for 
movement toward and away from a dispensing position at 

15 which the tip of the dispenser taps a support sxirface, 
to dispense a selected volume of reagent solution, as 
described above. This movement is effected by a 
solenoid 76 as described above. Solenoid 76 is xinder 
the control of a control xrnit 77 whose operation will 

20 be described below. The solenoid is also referred to 
herein as dispensing means for moving the device into 
tapping engagement with a support, when the device is 
positioned at a defined array position with respect to 
that support. 

25 The dispenser device is carried on an arm 74 which 

iis threadedly mounted on a worm screw 80 driven 
(rotated) in a desired direction by a stepper motor 82 
also under the control of unit 77. At its left end in 
the figure screw 80 is carried in a sleeve 84 for 

30 rotation about the screw axis. At its other end, the 
screw is mounted to the drive shaft of the stepper 
motor, which in turn is carried on a sleeve 86. The 
dispenser device, worm screw, the two sleeves mounting 
the worm screw, and the stepper motor used in moving 

35 the device in the ••x" (horizontal) direction in the 
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figure form what is referred to here collectively as a 
displacement assembly 86. 

The displacement assembly is constructed to 
produce precise, micro-range movement in the direction 
5 of the screw, l*e., along an x axis in the figure. In 
one mode, the assembly functions to move the dispenser 
in X-axis increments having a selected distance in the 
range 5-25 /im. In another mode, the dispenser unit may 
be moved in precise x-axis increments of several 

10 microns or more,; for positioning the dispenser at 

associated positions on adjacent supports, as will be 
described below. 

The displacement assembly, in turn, is mounted for 
movement in the "y" (vertical) axis of the figure, for 

15 positioning the dispenser at a selected y axis 

position. The structure mounting the assembly includes 
a fixed rod 88 mounted rigidly between a pair of freune 
bars 90, 92, and a worm screw 94 mounted for rotation 
between a pair of frame bars 96, 98. The worm screw is 

2 0 driven (rotated) by a stepper motor 100 which operates 

under the control of unit 77. The motor is mounted on 
bar 96, as shown. 

The structure just described, including worm screw 
94 and motor 100, is constructed to produce precise, 
25 micro-range movement in the direction of the screw, 
i.e., along an y axis in the figure. As above, the 
structure functions in one mode to move the dispenser 
in y-axis increments having a selected distance in the 
range 5-250 /xm, and in a second mode, to move the 

3 0 dispenser in precise y-axis increments of several 

microns (m^) or more, for positioning the dispenser at 
associated positions on adjacent supports. 

The displacement assembly and structure for moving 
this assembly in the y axis are referred to herein 
35 collectively as positioning means for positioning the 
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dispensing device at a selected array position with 
respect to a support. 

A holder 102 in the apparatus functions to hold a 
plurality of supports, such as supports 104 on which 
5 the microarrays of regent regions are to be formed by 
the apparatus. The holder provides a number of 
recessed slots, such as slot 106, which receive the 
supports, and position them at precise selected 
positions with respect to the frame bars on which the 

10 dispenser moving means is moiinted. 

As noted above, the control unit in the device 
functions to actuate the two stepper motors emd 
dispenser solenoid in a sequence designed for automated 
operation of the apparatus in forming a selected 

15 microarray of reagent regions on each of a plurality of 
supports . 

The control unit is constructed, according to 
conventional microprocessor control principles, to 
provide appropriate signals to each of the solenoid and 

20 each of the stepper motors, in a given timed sequence 
and for appropriate signalling time. The construction 
of the unit, and the settings that are selected by the 
user to achieve a desired array pattern, will be 
xinderstood from the following description of a typical 

25 apparatus operation. 

Initially, one or more supports are placed in one 
or more slots in the holder. The dispenser is then 
moved to a position directly above a well (not shown) 
containing a solution of the first reagent to be 

30 dispensed on the support (s) . The dispenser solenoid is 
actuated now to lower the dispenser tip into this well, 
causing the capillary channel in the dispenser to fill. 
Motors 82, 100 are now actuated to position the 
dispenser at a selected array position at the first of 

35 the supports. Solenoid actuation of the dispenser is 
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then effective to dispense a selected-volume droplet of 
that reagent at this location. As noted above, this 
operation is effective to dispense a selected volume 
preferably between 2 pi and 2 nl of the reagent 
5 solution. 

The dispenser is now moved to the corresponding 
position at an adjacent support and a simiieor volume of 
the solution is dispensed at this position. The 
process is repeated until the reagent has been 

10 dispensed at this preselected corresponding position on 
each of the supports. 

Where it is desired to dispense a single reagent 
at more than two array positions on a support, the 
dispenser may be moved to different array positions at 

15 each support, before moving the dispenser to a new 
support, or solution can be dispensed at individual 
positions on each support, at one selected position, 
then the cycle repeated for each new array position. 
To dispense the next reagent, the dispenser is 

20 positioned over a wash solution (not shown) , and the 
dispenser tip is dipped in and out of this solution 
until the reagent solution has been substantially 
washed from the tip. Solution can be removed from the 
tip, after each dipping, by vacuum, compressed air 

25 spray, sponge, or the like. 

The dispenser tip is now dipped in a second 
reagent well, and the filled tip is moved to a second 
selected array position in the first support. The 
process of dispensing reagent at each of the 

30 corresponding second-array positions is then carried as 
above. This process is repeated until an entire 
microarray of reagent solutions on each of the supports 
has been formed. 

35 IV. Microarray Substrate 
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This section describes embodiments of a substrate 
having a microarray of biological polymers carried on 
the substrate surface. Subsection A describes a multi- 
cell substrate, each cell of which contains a 
5 microarray, and preferably an identical microarray, of 
distinct biopoliTaers, such as distinct polynucleotides, 
formed on a porous surface. Subsection B describes a 
microarray of distinct polynucleotides bound on a glass 
slide coated with a polycationic polymer. 

10 

A. Multi-Cell Substrate 

Fig. 9 illustrates, in plan view, a substrate 110 
constructed according to the invention. The substrate 
has an 8 X 12 rectangular array 112 of cells, such as 

15 cells 114, 116, formed on the substrate surface. With 
reference to Fig. lO, each cell, such as cell 114, in 
turn supports a microarray 118 of distinct biopolymers, 
such as polypeptides or polynucleotides at known, 
addressable regions of the microarray. Two such 

20 regions forming the microarray are indicated at 120, 

and correspond to regions, such as regions 42, forming 
the microarray of distinct biopolymers shown in Fig. 3. 

The 96-cell array shown in Fig. 9 has typically 
array dimensions between about 12 and 244 mm in width 

25 and 8 and 400 mm in length, with the cells in the array 
having width and length dimension of 1/12 and 1/8 the 
array width and length dimensions, respectively, i.e., 
between about 1 and 20 in width and 1 and 50 mm in 
length. 

30 The construction of substrate is shown cross- 

sectionally in Fig. 11, which is an enlarged sectional 
view taken along view line 124 in Fig. 9. The 
substrate includes a water-impermeable backing 126, 
such as a glass slide or rigid polymer sheet. Formed 

35 on the surface of the backing is a water-permeable film 
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128. The film is formed of a porous membrane material, 
such as nitrocellulose membrane, or a porous web 
material, such as a nylon, polypropylene, or PVDF 
porous polymer material. The thickness of the film is 
5 preferably between about 10 and 1000 Mm. The film may 
be applied to the backing by spraying or coating 
uncxired material on the backing, or by applying a 
preformed membrane to the backing. The backing and 
film may be obtained as a preformed unit from 

10 commercial source, e.g., a plastic-backed 

nitrocellulose film available from Schleicher and 
Schuell Corporation. 

With continued reference to Fig. 11, the film- 
covered surface in the svibstrate is partitioned into a 

15 desired array of cells by water-impermeable grid lines, 
such as lines 130, 132, which have infiltrated the film 
down to the level of the backing, and extend above the 
surface of the film as shown, typically a distance of 
100 to 2000 Mm above the film surface. 

20 The grid lines are formed on the substrate by 

laying down an unctired or otherwise f lowable resin or 
elastomer solution in an array grid, allowing the 
material to infiltrate the porous film down to the 
backing, then curing or otherwise hardening the grid 

25 lines to form the cell-array substrate. 

One preferred material for the grid is a f lowable 
silicone available from Loctite Corporation. The 
barrier material can be extruded through a narrow 
syringe (e.g., 22 gauge) using air pressure or 

30 mechanical pressiire. The syringe is moved relative to 
the solid support to print the barrier elements as a 
grid pattern. The extruded bead of silicone wicks into 
the p r s of the solid support and cures to form a 
shallow waterproof barrier separating the regions of 

35 the solid support. 



wo 95/35505 



PCr/US95/07659 



25 



In alternative embodiments, the barrier element 
can be a wax-based material or a thermoset material 
such as epoxy. The barrier material can also be a UV- 
curing polymer which is exposed to UV light after being 
5 printed onto the solid support. The barrier material 
may also be applied to the solid support using printing 
techniques such as silk-screen printing. The barrier 
material may also be a heat-seal stamping of the porous 
solid support which seals its pores and forms a water- 
10 impervious barrier element. The barrier material may 
also be a shallow grid which is laminated or otherwise 
adhered to the solid support. 

In addition to plastic-backed nitrocellulose, the 
solid support can be virtually any porous membrane with 
15 or without a non-porous backing. Such membranes are 
readily available from numerous vendors and are made 
from nylon, PVDF, polysulfone and the like. In an 
alternative embodiment, the barrier element may also be 
used to adhere the porous membrane to a non-porous 
20 backing in addition to fxinctioning as a barrier to 
prevent cross contamination of the assay reagents. 

In an alternative embodiment, the solid support 
can be of a non-porous material. The barrier can be 
printed either before or after the microarray of 
25 biomolecules is printed on the solid support. 

As can be appreciated, the cells formed by the 
grid lines and the underlying backing are water- 
impermeable, having side barriers projecting above the 
porous film in the cells. Thus, def ined-vol\ime samples 
can be placed in each well without risk of cross- 
contamination with seunple ^material in adjacent cells. 
In Fig. 11, defined volumes samples, such as sample 
134, are shown in the cells. 

As noted above, each well contains a microarray of 
35 distinct biopolymers. In one general embodiment, the 



30 



wo 95^35505 



PCT/DS95/07659 



26 

microarrays in the well are identical arrays of 
distinct biopolymers, e.g., different sequence 
polynucleotides. Such arrays can be formed in 
accordance with the methods described in Section II, by 
5 depositing a first selected polynucleotide at the same 
selected microarray position in each of the cells, then 
depositing a second polynucleotide at a different 
microarray position in each well, and so on until a 
complete, identical microarray is formed in each cell. 

10 In a preferred embodiment, each microarray 

contains about lO' distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm^. Also in a preferred embodiment, the 
biopolymers in each microarray region are present in a 

15 defined amount between about 0.1 femtomoles and 100 

nanomoles. The ability to form high-density arrays of 
biopolymers, where each region is formed of a well- 
defined amoxint of deposited material, can be achieved 
in accordance with the microarray-f orming method 

20 described in Section II. 

Also in a preferred embodiments, the biopolymers 
are polynucleotides having lengths of at least about 50 
bp, i.e., substantially longer than oligonucleotides 
which can be formed in high-density arrays by schemes 

25 involving parallel, step-wise polymer synthesis on the 
array surface. 

In the case of a polynucleotide array, in an assay 
procedure, a small volume of the labeled DNA probe 
mixture in a standard hybridization solution is loaded 

3 0 onto each cell. The solution will spread to cover the 
entire microarray and stop at the barrier elements. 
The solid support is then incubated in a humid chamber 
at the appropriate temperature as required by the 
assay. 
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Each assay may be conducted in an "open-face* 
format where no further sealing step is required, since 
the hybridization solution will be kept properly 
hydrated by the water vapor in the humid chamber. At 
5 the conclusion of the incubation step, the entire solid 
support containing the numerous microarrays is rinsed 
quickly enough to dilute the assay reagents so that no 
significant cross contamination occurs. The entire 
solid support is then reacted with detection reagents 

10 if needed and arvalyzed using standard colorimetric, 
radioactive or fluorescent detection means. All 
processing and detection steps are performed 
simultaneously to all of the microarrays on the solid 
support ensuring uniform assay conditions for all of 

15 the microarrays on the solid support. 

B. Glass-Slide Polynucleotide Array 
Fig. 5 shows a substrate 136 formed according to 
another aspect of the invention, eind intended for use 

20 in detecting binding of labeled polynucleotides to one 
or more of a plurality distinct polynucleotides. The 
substrate includes a glass substrate 138 having formed 
on its surface, a coating of a polycat ionic polymer, 
preferably a cationic polypeptide, such as polylysine 

25 or polyarginine. Formed on the polycationic coating is 
a microarray 140 of distinct polynucleotides, each 
localized at known selected array regions, such as 
regions 142. 

The slide is coated by placing a uniform-thickness 
30 film of a polycationic polymer, e.g., poly-l-lysine, on 
the surface of a slide and drying the film to form a 
dried coating. The amount of polycationic polymer 
added is sufficient to form at least a monolayer of 
polymers on the glass surface. The polymer film is 
35 bound to surface via electrostatic binding between 
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negative silyl-OH groups on the surface and charged 
amine groups in the polymers. Poly-l-lysine coated 
glass slides may be obtained commercially, e.g., from 
Sigma Chemical Co. (St. Louis, MO) . 
5 To form the microarray, defined volumes of 

distinct polynucleotides are deposited on the polymer- 
coated slide, as described in Section II. According to 
an important feature of the substrate, the deposited 
polynucleotides remain bound to the coated slide 

10 surface non-covalently when an aqueous DNA sample is 
applied to the substrate under conditions which allow 
hybridization of reporter-labeled polynucleotides in 
the sample to complementary-sequence (single-stranded) 
polynucleotides in the substrate array. The method is 

15 illustrated in Examples 1 and 2. 

To illustrate this featxire, a substrate of the 
type just described, but having an array of same- 
sequence polynucleotides, was mixed with fluorescent- 
labeled complementary DNA under hybridization 

20 conditions. After washing to remove non-hybridized 
material, the substrate was examined by low-power 
fluorescence microscopy. The array can be visualized 
by the relatively uniform labeling pattern of the array 
regions . 

25 In a preferred embodiment, each microarray 

contains at least 10^ distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm^. In the embodiment shown in Fig. 5, the 
microarray contains 400 regions in an area of about 16 

30 mm^, or 2.5 x 10^ regions/cm^. Also in a preferred 

embodiment, the polynucleotides in the each microarray 
region are present in a defined amount between about 
0.1 femtomoles and 100 nanomoles in the case of 
polynucleotides. As above, the ability to form high- 
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density arrays of this type/ where each region is 
formed of a well-defined amount of deposited material, 
can be achieved in accordance with the microarray- 
forming method described in Section II. 

Also in a preferred embodiments, the 
polynucleotides have lengths of at least about 50 bp, 
i.e., substantially longer than oligonucleotides which 
can be formed in high-density arrays by various in situ 
synthesis schemes. 



V. Utility 

Microarrays of immobilized nucleic acid sequences 
prepared in accordance with the invention can be used 
for large scale hybridization assays in numerous 

15 genetic applications, including genetic and physical 

mapping of genomes, monitoring of gene expression, DNA 
sequencing, genetic diagnosis, genotyping of organisms, 
and distribution of DNA reagents to researchers. 

For gene mapping, a gene or a cloned DNA fragment 

20 is hybridized to an ordered array of DNA fragments, and 
the identity of the DNA elements applied to the array 
is unambiguously established by the pixel or pattern of 
pixels of the array that are detected. One application 
of such arrays for creating a genetic map is described 

25 by Nelson, et al, (1993). In constructing physical 
maps of the genome, arrays of immobilized cloned DNA 
fragments are hybridized with other cloned DNA 
fragments to establish whether the cloned fragments in 
the probe mixtxire overlap and are therefore contiguous 

30 to the immobilized clones on the array. For example, 
Lehrach, et al., describe such a process. 

The arrays of immobilized DNA fragments may also 
be used for genetic diagnostics. To illustrate, an 
array containing multiple forms of a mutated gene or 

35 genes can be probed with a labeled mixture of a 



wo 95/35505 



PCrAJS95/076S9 



30 

patient's DNA which will preferentially interact with 
only one of the immobilized versions of the gene. 

The detection of this interaction can lead to a 
medical diagnosis. Arrays of immobilized DNA fragments 
5 can also be used in DNA probe diagnostics. For 

example, the identity of a pathogenic microorganism can 
be established unambiguously by hybridizing a sample of 
the unknown pathogen's DNA to an array containing many 
types of known pathogenic DNA. A similar technique can 

10 also be used for janambiguous genotyping of any 

organism, other molecules of genetic interest, such as 
cDNA's and RNA's can be immobilized on the array or 
alternately used as the labeled probe mixture that is 
applied to the array. 

15 In one application, an array of cDNA clones 

representing genes is hybridized with total cDNA from 
an organism to monitor gene expression for reseeirch or 
diagnostic purposes. Labeling total cDNA from a normal 
cell with one color fluorophore and total cDNA from a 

20 diseased cell with another color fluorophore and 

simultaneously hybridizing the two cDNA samples to the 
same array of cDNA clones allows for differential gene 
expression to be measured as the ratio of the two 
fluorophore intensities. This two-color experiment can 

25 be used to monitor gene expression in different tissue 
types, disease states, response to drugs, or response 
to environmental factors. & An example of this approach 
is illustrated in Examples 2, described with respect to 
Fig. 8. 

3 0 By way of example and without implying a 

limitation of scope, such a procedure could be used to 
simultaneously screen many patients against all known 
mutati ns in a disease gene. This invention could be 
used in the form of, for example, 96 identical 0.9 cm x 

35 2.2 cm microarrays fabricated on a single 12 cm x 18 cm 
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sheet of plastic-backed nitrocellulose where each 
microarray could contain, for example, 100 DNA 
fragments representing all known mutations of a given 
gene. The region of interest from each of the DNA 
5 samples from 96 patients could be amplified, labeled, 
and hybridized to the 96 individual arrays with each 
assay performed in 100 microliters of hybridization 
solution. The approximately 1 thick silicone rubber 
^^^ier elements between individual arrays prevent 

10 cross contamination of the patient samples by sealing 
the pores of the nitrocellulose and by acting as a 
physical barrier between each microarray. The solid 
support containing all 96 microarrays assayed with the 
96 patient samples is incvibated, rinsed, detected and 

15 analyzed as a single sheet of material using standard 
radioactive, fluorescent, or color imetric detection 
means (Maniatas, etal., 1989). Previously, such a 
procedure would involve the handling, processing and 
tracking of 96 separate membranes in 96 separate sealed 

20 chaunbers. By processing all 96 arrays as a single 

sheet of material, significant time and cost savings 
are possible. 

The assay format can be reversed where the patient 
or organism's DNA is immobilized as the array elements 

25 and each array is hybridized with a different mutated 
allele or genetic marker • The gridded solid support 
can also be used for parallel non-DNA ELISA assays. 
Furthermore, the invention allows for the use of all 
standard detection methods without the need to remove 

30 the shallow barrier elements to carry out the detection 
step. 

In addition to the genetic applications listed 
above, arrays of whole cells, peptides, enzymes, 
antibodies, antigens, receptors, ligands, 
35 phospholipids, polymers, drug cogener preparations or 
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described in this invention for large scale screening 
assays in medical diagnostics, drug discovery, 
molecular biology, immunology and toxicology. 
5 The multi-cell substrate aspect of the invention 

allows for the rapid and convenient screening of many 
DNA probes against many ordered arrays of DNA 
fragments. This eliminates the need to handle and 
detect many individual arrays for performing mass 
10 screenings for genetic research and diagnostic 

applications. Nximerous microarrays can be fabricated 
on the same solid support and each microarray reacted 
with a different DNA probe while the solid support is 
processed as a single sheet of material. 

15 

The following examples illustrate, but in no way 
are intended to limit, the present invent ion - 

Example 1 

20 Genomic-Complexitv Hybridization to Micro 

DNA Arravs Representing the Yeast 
SaccharomycBs cerevisiae Genome with 
Two-color Fluorescent Detection 

The array elements were randomly amplified PCR 

25 (Bohlander, et al., 1992) products using physically 

mapped lambda clones of S. cerevisiae genomic DNA 

templates (Riles, et al., 1993). The PCR was performed 

directly on the lambda phage lysates resulting in an 

amplification of both the 35 kb lambda vector and the 

30 5-15 kb yeast insert sequences in the form of a uniform 

distribution of PCR product between 250-1500 base pairs 

in length. The PCR product was purified using 

Sephadex G50 gel filtration (Pharmacia, Piscataway, NJ) 

and concentrated by evaporation to dryness at room 

35 temperature overnight. Each of the 8 64 amplified 
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lambda clones was rehydrated in 15 ^1 of 3 x SSC in 
preparation for spotting onto the glass. 

The micro arrays were fabricated on microscope 
slides which were coated with a layer of poly-l-lysine 
5 (Sigma) . The automated apparatus described in Section 
IV loaded 1 ^1 of the concentrated lambda clone PGR 
product in 3 x ssc directly from 96 well storage plates 
into the open capillary printing element and deposited 
-5 nl of sample per slide at 380 micron spacing between 

10 spots, on each of 40 slides » The process was repeated 
for all 864 samples and 8 control spots. After the 
spotting operation was complete, the slides were 
rehydrated in a humid chamber for 2 hours, baked in a 
dry 80" vacuum oven for 2 hours, rinsed to remove un- 

15 absorbed DNA and then treated with succinic anhydride 
to reduce non-specific adsorption of the labeled 
hybridization probe to the poly-l-lysine coated glass 
surface. Immediately prior to use, the immobilized DNA 
on the eirray was denatured in distilled water at 90** 

20 for 2 minutes. 

For the pooled chromosome experiment, the 16 
chromosomes of SaccharomycBs cerevisiae were separated 
in a CHEF agarose gel apparatus (Biorad, Richmond, CA) . 
The six largest chromosomes were isolated in one gel 

25 slice emd the smallest 10 chromosomes in a second gel 
slice. The DNA was recovered using a gel extraction 
kit (Qiagen, Chatsworth, CA) . The two chromosome pools 
were randomly amplified in a manner similar to that 
used for the target lambda clones. Following 

30 amplification, 5 micrograms of each of the amplified 

chromosome pools were separately random-primer labeled 
using Klenow polymerase (Amersham, Arlington Heights, 
IL) with a lissamine conjugated nucleotide analog 
(Dupont NEN, Boston, MA) for the pool containing the 

35 six largest chromosomes, and with a fluorescein 
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conjugated nucleotide analog (BMB) for the pool 
containing smallest ten chromosomes. The two pools 
were mixed and concentrated using an ultraf ilt:ration 
device (Amicon, Danvers, MA). 
5 Five micrograms of the hybridization probe 

consisting of both chromosome pools in 7.5 /il of TE was 
denatured in a boiling water bath and then snap cooled 
on ice. 2.5 /il of concentrated hybridization solution 
(5 X BSC and 0.1% 5DS) was added and all 10 ^1 

10 transferred to the array surface, covered with a cover 
slip, placed in a custom-built single-slide humidity 
chamber and incubated at 60** for 12 hours. The slides 
were then rinsed at room temperature in 0.1 x SSC and 
0.1%SDS for 5 minutes, cover slipped and scanned. 

15 A custom built laser fluorescent scanner was used 

to detect the two-color hybridization signals from the 
1.8 X 1.8 cm array at 20 micron resolution. The 
scanned image was gridded and analyzed using custom 
image analysis software. After correcting for optical 

20 crosstalk between the fluorophores due to their 
overlapping emission spectra, the red and green 
hybridization values for each clone on the array were 
correlated to the known physical map position of the 
clone resulting in a computer-generated color karyotype 

25 of the yeast genome. 

Figure 6 shows the hybridization pattern of the 
two chromosome pools. A red signal indicates that the 
lambda clone on the array surface contains a cloned 
genomic DNA segment from one of the largest six yeast 

30 chromosomes. A green signal indicates that the lambda 
clone insert comes from one of the smallest ten yeast 
chromosomes. Orange signals indicate repetitive 
sequences which cross hybridized to both chromosome 
pools. Control spots on the array confirm that the 

35 hybridization is specific and reproducible. 
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The physical nap locations of the genomic DNA 
fragments contained in each of the clones used as array 
elements have been previously determined by Olson and 
co-workers (Riles, et al.) allowing for the automatic 
generation of the color karyotype shown in Figure 7. 
The color of a chromosomal section on the karyotype 
corresponds to the color of the array element 
containing the clone from that section. The black 
regions of the karyotype represent false negative dark 
spots on the array (10%) or regions of the genome not 
covered by the Olson clone library (90%) . Note that 
the largest six chromosomes are mainly red while the 
smallest ten chromosomes are mainly green matching the 
original CHEF gel isolation of the hybridization probe. 
Areas of the red chromosomes containing green spots and 
vice-versa are probably due to spurious sample tracking 
errors in the formation of the original library and in 
the amplification and spotting procedures. 

The yeast genome arrays have also been probed with 
individual clones of pools of clones that are 
fluorescently labeled for physical mapping purposes. 
The hybridization signals of these clones to the array 
were translated into a position on the physical map of 
yeast. 



Example 2 

Total cDNA Hvbridi zed to Mirro Arravs nf 
cDNA Clones with Two-CqIot- 
Fluorescent Detectiop 



24 clones containing cDNA inserts from the plant 
Arabidopsis were amplified using PCR. Salt was added 
to the purified PCR products to a final concentration 
of 3 X SSC. The cDNA clones were spotted on poly-1- 
lysine coated microscope slides in a manner similar to 
35 Example 1. Among the cDNA clones was a clone 
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representing a transcription factor HAT 4, which had 
previously been used to create a transgenic line of the 
plant Arabidopsis , in which this gene is present at ten 
times the level found in wild-type Arabidopsis (Schena, 
5 et al., 1992). 

Total poly-A inRNA from wild type Arabidopsis was 
isolated using standard methods (Maniatis, et al., 
1989) and reverse transcribed into total cDNA, using 
fluorescein nucleotide analog to label the cDKA product 

10 (green fluorescence) . A similar procedure was 

performed with the transgenic line of AraJbidopsis where 
the transcription factor HAT4 was inserted into the 
genome using standard gene transfer protocols. cDNA 
copies of mRNA from the transgenic plant are labeled 

15 with a lissamine nucleotide analog (red fluorescence) • 
Two micrograms of the cDNA products from each type of 
plant were pooled together and hybridized to the cDNA 
clone array in a 10 microliter hybridization reaction 
in a manner similar to Example l. Rinsing and 

20 detection of hybridization was also performed in a 

manner similar to Example 1. Fig* 8 show the resulting 
hybridization pattern of the array. 

Genes equally expressed in wild type and the 
transgenic Arabidopsis appeared yellow due to equal 

25 contributions of the green and red fluorescence to the 
final signal. The dots are different intensities of 
yellow indicating various levels of gene expression. 
The gDNA clone representing the transcription factor 
HAT4 , expressed in the transgenic line of Arabidopsis 

30 but not detectably expressed in wild type Arabidopsis, 
appears as a red dot (with the arrow pointing to it) , 
indicating the preferential expression of the 
transcription factor in the red-labeled transgenic 
AraJbidopsis and the relative lack of expression of the 
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transcription factor in the green-labeled wild type 
Arabidopsis . 

An advantage of the microarray hybridization 
format for gene expression studies is the high partial 
concentration of each cDNA species achievable in the 10 
microliter hybridization reaction. This high partial 
concentration allows for detection of rare transcripts 
without the need for PGR amplification of the 
hybridization probe which may bias the true genetic 
representation of each discrete cDNA species. 

Gene expression studies such as these can be used 
for genomics research to discover which genes are 
expressed in which cell types, disease states, 
development states or environmental conditions. Gene 
expression studies can also be used for diagnosis of 
disease by empirically correlating gene expression 
patterns to disease states* 

Example 3 

Multiplexed Colo rimetric Hybridization on 
a Gridde d Solid Support 

A sheet of plastic-backed nitrocellulose was 

gridded with beunrier elements made from silicone rubber 

according to the description in Section IV-A. The 

sheet was soaked in 10 x SSC and allowed to dry. As 

shown in Fig. 12, 192 M13 clones each with a different 

yeast inserts were arrayed 4 00 microns apart in four 

quadrants of the solid support using the automated 

device described in Section III. The bottom left 

quadrant served as a negative control for hybridization 

while each of the other three quadrants was hybridized 

simultaneously with a different oligonucleotide using 

the open-face hybridization technology described in 

Section IV-A. The first two and last four elements of 



wo ^5505 



PCTAJS95An659 



38 

each array are positive controls for the colorimetric 
detection step. 

The oligonucleotides were labeled with fluorescein 
which was detected using an anti-f luorescein antibody 
5 conjugated to alkaline phosphatase that precipitated an 
NBT/BCIP dye on the solid support (Amersham) • Perfect 
matches between the labeled oligos and the H13 clones 
resulted in dark spots visible to the naked eye and 
detected using an optical scanner (HP ScanJet II) 

10 attached to a personal computer • The hybridization 
patterns are different in every quadrant indicating 
that each oligo found several iinigue M13 clones from 
among the 192 with a perfect sequence match. Note that 
the open capillary printing tip leaves detectable 

15 dimples on the nitrocellulose which can be used to 
automatically align and analyze the images. 

Although the invention has been described with 
respect to specific embodiments and methods, it will be 
20 clear that various changes and modification may be made 
without departing from the invention. 
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IT IS CLAIMED: 



1. 



A method of forming a microarray of analyte- 
assay regions on a solid support, where each region in 
5 the array has a known amount of a selected, analyte- 
specific reagent, said method comprising, 

(a) loading a solution of a selected analyte- 
specific reagent in a reagent-dispensing device having 
an elongate capillary channel (i) formed by spaced- 

10 apart, coextensive elongate members, (ii) adapted to 
hold a quantity of the reagent solution and (iii) 
having a tip region at which aqueous solution in the 
channel forms a meniscus, 

(b) tapping the tip of the dispensing device 

15 against a solid support at a defined position on the 
surface, with an impulse effective to break the 
meniscus in the capillary channel and deposit a 
selected volume of solution on the surface, and 

(c) repeating steps (a) and (b) until said array 
20 is formed. 



2. The method of claim 1, wherein said tapping is 
carried out with an impulse effective to deposit a 
selected volume in the volume range between 0.01 to 100 

25 nl. 

3. The method of claim 1, wherein said channel is 
formed by a pair of spaced-apart tapered elements. 

4. The method of claim 1, for forming a plurality 
of such arrays, wherein step (b) is applied to a 
selected position on each of a plurality of solid 
supports at each repeat cycle proceeding step (c) . 
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5. The method of claim 1, which further includes, 
after performing steps (a) and (b) at least one time, 
reloading the reagent-dispensing device with a new 
reagent solution by the steps of (i) dipping the 
5 capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillciry channel into 
the new reagent solution. 

10 6. Automated apparatus for forming a microarray 

of analyte^-assay regions on a plurality of solid 
supports, where each region in the array has a known 
amoiint of a selected, analyte-specif ic reagent, said 
apparatus comprising 

15 (a) a holder for holding, at known positions, a 

plurality of planar supports, 

(b) a reagent dispensing device having an open 
capillary channel (i) formed by spaced-apart, 
coextensive elongate members (ii) adapted to hold a 

20 quantity of the reagent solution and (iii) having a tip 
region at which aqueous solution in the channel forms a 
meniscus, 

(c) positioning means for positioning the 
dispensing device at a selected array position with 

25 respect to a support in said holder, 

(d) dispensing means for moving the device into 
tapping engagement against a support with a selected 
impulse, when the device is positioned at a defined 
Eurray position with respect to that support, with an 

30 impulse effective to break the meniscus of liquid in 

the capillary channel and deposit a selected volume of 
solution on the surface, and 

(e) control means for controlling said positioning 
and dispensing means. 



35 
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7. The apparatus of claim 6, wherein said 
dispensing means is effective to move said dispensing 
device against a support with an impulse effective to 
deposit a selected volume in the volume range between 

5 0.01 to 100 nl. 

8. The apparatus of claim 6, wherein said channel 
is formed by a pair of spaced-apart tapered elements. 

0 9. The apparatus of claim 6, wherein the control 

means operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent, and 

5 (iii) dispense the reagent at a defined array position 
on each of the supports on said holder. 

10. The apparatus of claim 6, wherein the control 
device further operates, at the end of a dispensing 

0 cycle, to wash the dispensing device by (i) placing the 
dispensing device at a washing station, (ii) moving the 
capillary channel in the device into a wash fluid, to 
load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 

> device with a fresh selected reagent. 

11. The apparatus of claim 6, wherein said device 
is one of a plurality of such devices which are carried 
on the arm for dispensing different analyte assay 

) reagents at selected spaced array positions. 



12. A substrate with a surface having a 
microarray of at least lO' distinct polynucleotide or 
polypeptide biopolymers per 1 cm^ surface area, each 
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distinct biopolymer sample (i) being disposed at a 
separate, defined position in said array, (ii) having a 
length of at least 50 subunits, and (iii) being present 
in a defined amount between about 0.1 femtomole and 100 
5 nanomoles. 

13. The substrate of claim 12, wherein said 
surface is glass slide coated with poly lysine, and said 
biopolymers are polynucleotides. 

10 

14. The substrate of claim 12, wherein said 
substrate has a water-impermeable backing, a water- 
permeable film formed on the backing, and a grid formed 
on the film, where said grid (i) is composed of 

15 intersecting water-impervious grid elements extending 

from said backing to positions raised above the surface 
of said film, and (ii) partitions the film into a 
plurality of water -impervious cells, where each cell 
contains such a biopolymer array. 

20 

15. A siibstrate with a surface array of sample- 
receiving cells, comprising 

a water-impermeable backing, 

a water-permeable film formed on the backing, and 
25 a grid formed on the film, said grid being composed of 
intersecting water-impervious grid elements extending 
from said backing to positions raised above the surface 
of said film. 

30 16. The substrate of claim 15, wherein the cells 

of the array each contain an array of biopolymers. 



35 



17. A substrate for use in detecting binding of 
labeled biopolymers to one or more of a plurality 
distinct polynucleotides, comprising 
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a non-porous, glass substrate, 

a coating of a cationic polymer on said substrate, 

and 

an array of distinct polynucleotides to said 
5 coating, where each biopolymer is disposed at a 
separate, defined position in a surface array of 
biopolymers. 

18. A method of detecting differential expression 

10 of each of a plurality of genes in a first cell type 
with respect to expression of the same genes in a 
second cell types, said method comprising 

producing fluorescence-labeled cDNA's from mRNA's 
isolated from the two cells types, where the cDNA's 

15 from the first and second cells are labeled with first 
and second different fluorescent reporters, 

adding a mixtxire of the labeled cDNA's from the 
two cell types to an array of polynucleotides 
representing a plurality of known genes derived from 

20 the two cell types, under conditions that result in 

hybridization of the cDNA's to complementary-sequence 
polynucleotides in the eirray; and 

examining the array by fluorescence luider 
fluorescence excitation conditions in which (i) 

25 polynucleotides in the arxay that are hybridized 

predominantly to cDNA's derived from one of the first 
and second cell types give a distinct first or second 
fluorescence emission color, respectively, and (ii) 
polynucleotides in the array that are hybridized to 

3 0 substantially equal numbers of cDNA's derived from the 
first and second cell types give a distinct combined 
fluorescence emission color, respectively, 

wherein the relative expression of known genes in 
the two cell types can be determined by the observed 

3 5 fluorescence emission color of each spot. 
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19. The method of claim 18, wherein the array of 
polynucleotides is formed on a substrate with a surface 
having an array of at least 10^ distinct polynucleotide 
or polypeptide biopolymers in a surface area of less 

5 than about 1 cm^, each distinct biopolymer (i) being 

disposed at a sepeurate, defined position in said array, 
(ii) having a length of at least 50 sxibunits, and (iii) 
being present in a defined amount between about .1 
femtomole and 100 nmoles. 

0 

20. The method of claim 19, wherein said surf ace 
is a glass slide coated with poly lysine, and said 
biopolymers are polynucleotides non-covalently bound to 
said poly lysine. 
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METHODS FOR FABRlCAnNG 
MICROARRAYS OF BIOLOGICAL SAMPLES 

CnOSS.REFERENCE TO RELAJH) 
APPUCATION 

This ippiiatioQ is a cDntiBuiuoQ-u}*pan of U^. paicot 
apphcatioo Ser. No. 06/261388. filed Jun. 17, 1994. and 
oow abaDck>oed. 

Tbe Doited States govcrniDcot may have cenaio rights io 
!bc prr^cof iDvcnuuo purxusot ip Gr9Qi Nu. HGQD4S0 
1 warded by the Nadooa) insiiiuics of Health. 

HELD OF THE INVENTION 

This isvcDtioD relates lo a met bod and appararus fot 
fabiicatiDg micToamys of biological samples for large scale 
screeoiog assays, sud) as arrays of DNA samples to be used 
is DNA b>'bridi£Uioo assays for geoeiic research aod diag- 
oiMUc upplimbuoa^. 

REFERENCES 

AbouDcd, ct at. Journal of AOAC Jntcmanonal 77(2) 
:495-500 (1994). 

BobUoder. ei al.. Ccnomia 13:1322-1324 (1992). 

Dnnanac, et lU Sdence 260:1649-1652 (1993). 

Fodor, ct al. Science 251:767-773 (1991). 

Khrapko, et al., DNA Sequence 1 J75-388 (1991). 

Kunyama, el al, AN ISFtT BIOSENSOR, AFPUED 
BIOSENSORS (Donald Wise. Ed.), Buuerworths, pp. 
93-114 (1989). 

Uhrach. ci lU HYBRIDIZATION FJNCEJWRJNTINC IN 
GENOME MAPPING AND SEQUENCING, GENOME 
ANALYSIS, VOL 1 (Davics aod Tdgbam. Eds.), Cold Spnog 
Harbor Press, pp. 39-^1 (1990). 

Maoiaiis, et al., MOLECULAR CLONLW, A lABORA- 
TORY MANUAL, Cold Spring Hartsor Pros (1989). 

NeUoD, ci al, Namrt Genencs 4:11-18 (1993). 

Piming, CI aU U.S. Pal. No. 5,143-854 (1992). 

Riles, el al, Generia 134:81-150 (1993). 

Scbcna. M. ci al., Proc, Nat. Acad. Sci. USA 
89:3894-3898 (1992). 

Souibcra, ei al, Genomia 13:1008-1017 (1992). 

BACKGROUND OF THE INVENTION 

A variety of methods are curreot))' available for making 
a nays of biological maciomolccuks, such as arrays of 
nucleic acid molecules or proteins. Ooe method for making 
ordered arrays of DNA on a porous membraoe b a "dot blot** 
approacb. b ibis method, a vacuum manifold transfers a 
plurality, tg, 96, aquenu.^ umple^ of DNA fmm 3 milli. 
meter diameter wells to a porous membraoe. A common 
variaoi of ibis procedure is i *'sk)t-blot* method io wbicb tbe 
wcUs have highly-duogaial oval shapes. 

Tbe DNA is immobilized oo the porous membrane by 
baking tbe ocmbrutc or cxpu^g ii lu UV radiaiiuo. This is 
a manual procedure practical for making ooc array ai a time 
and usually limited to 96 samples per array. **Doi-blo(" 
procedures arc therefore ioadcquate for applications io 
which ouny thousand samples must be determined. 

A more e&lcicm tedinique employed for making ordered 
anays of genomic fragments uses an array of pins dipped 
inio tbe wells, e.g., the 96 wells of a microiiire plate, for 
traosfening an array of samples to a substrate, such as i 



2 

pcrous ffiembrsoe. One amy inchjdes pios ibai m designed 
to spot a membnoe io a staggered faibico, for creating ao 
anay of 9216 ^>ots in a 22x22 an area (Lchracb. ci aL, 
1990). A hmiuiiao with ibis approach is that tbe volume of 
5 DNA Spotted io each pixel of each array is highly vatriablc. 
In sdditioo, the number of arrays that can be made with ca^ 
dipping » u.^ally quite vnall. 

AO alternate method of cteaiiog ordered arrays of nucleic 
add sequences is described by Pinung, et al. (1992X and 
jp also by Fodor, ct al. (1991). The method involves synihe- 
saaag different nudcic acid sequeaccs ai different discyett 
region* of a wipport This mcibocl employ* elibtunc svd- 
ihetic schemes, and is geoerally limited lo relatively sbon 
nucleic add umplc, e.g., less than 20 bases. A related 
method has been dcscTfl>ed by Southern, et al. (1992). 

Khrapko. et al. (1991) describes a metbod of making an 
oligonucleotide roaaix by spotting DNA onto a ihio layer of 
polyacrylamide. T^e filing is done manually with a 
micropzpeitc. 

None of the methods or devices dcsaibed in tbe prior an 
art designed for mass fabrication of microarrays characur- 
i2ed b>' (0 a large number of miao-sized assay regioos 
separated by a dL^uncc of 50-200 microns or le.w, and (ii) 
» ^^xU-defincd amouou typically in tbe picomole range, of 
^ aoalyu associated with each region of tbe array. 

Funbermore, current technology is directed at performing 
such BSQiiys unc at a time \a a single array of DNA mol- 
eaUcs. r-or example, tbe moat common method for perform- 
ing DNA bybridizatioos to arrays spotted onto porous mem- 
braoe involves scaling the membrwie in a plastic bag 
(Maniatas, et al, 1989) or a rotating glass cylinder (Robbins 
Sdeotific) with the labeled by^ridiaiion probe inside tbe 
sealed chamber. For arrays made on noo-porou* surfaces, 
such 9S a micmrarpe Midc, eadi array is iiiculiaicd with the 
35 labeled bybridiiaiion probe scaled under a covcrslip. These 
uchniques require a separate sealed chamber for cacb array 
which makes tbe screening and handling of many sucb 
arrays inconvcment and time intecsivc. 
Abouried, et al. (1994) describes a method of prbting 
40 horizontal lines of antibodies on a oiiroce Uulose membrane 
and separating regions of tbe tucmbrane with verticil stripes 
of a hydrophobic material. Eacb venical stripe is then 
reacted with a different antigen and the leactioo between tbe 
immobilized antibody and an antigen is detected using a 
45 suodsnl EDSA uilorimciric iccbniqoc. Abouzial's icxA- 
oiquc makes it possible to stTccn many ooc -dimension a! 
arrays simuluoeousJy on a single sbeet of nitrocellulose. 
Abou2ied makes tbe nitrocellulose somewfaat hydrophobic 
using a line drawn with PAP Pen (Reacvcb Products 
50 International)- However, AbouzKd does not describe a tech- 
nology that is capable of completely sealing tbe pores of tbe 
niifTxrUulnRc. The pore^nf the niurxellulnse arc mill physi- 
cally open and ao the assay reagents can leak tbrougb ibe 
hydrophobic barrier during extended high temperature iocu- 
53 baiioos or in the presence of detergcnu. wbicb makes, ibc 
Abouzied technique unacceptable for DNA bybridizatioo 
assays. 

Porous membranes with printed pattertis of bydrophilic/ 
hydrophobic regions exist for applicauons such as ordered 
60 arrays of baacria colonics. OA Life Sciences (San Diego 
Calif.) makes such a membraoe with a grid paiicm printed 
on iL However, this membraoe has the same disadvantage as 
the Ahou7ied technique Mncc reagcnu can Mill flow licrwecn 
the griddcd arrays making tbcm unusable for separate DNA 
Gj hybridization assays. 

PaB Corporation make a 96-ueU plate with a porous filter 
beat sealed lo tbe botiom of tbe plate. These plates are 
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capible of coDtaimag diffcrem rcagcnis io eacfa weD without pUce tbc dispeoaog dcviix at a Joadinc suuoil (ui move the 

CTOtt^DumiDitioD. Howcx-er. eacfa wtU is uneorted to boid capillary chuoel io tbc device ioio likcied rtaMt at ibe 

only CDC taiici clement whereas ibc iin^DijDB dcsmbcd badiag auiioa, to load ibc ispcasmg dcvia%.iib tbe 

bert makes a micioamy of maoy biomolecuks in eacb miffcni .nrf Ai^r^ ocvicc wuo loc 
subdivided rcg.00 of ibc solid supT4ri.FurtbtniK,R. Ibc 96 , ^^o^LJLt^ 
well plates ait at kutl cm Ihick^prrveittibe use of the 

dcv,a for many caW imei/ic, flunrc«ni and ndWaaive °^'V^^ ?^ * ^^T^^^ cydc. to wash tbc 

deiectioD fonnatt wtict nquut Uui tbc membnoc he fiat o»J«»«g dcncr by (i) placing the dispensing device at a 

igainsi the dcicaioo surface. The invcntioD described bert »ai»n. (d) moving the capillan- ctanoel lo tbe 

requires do hiribcr processing after the assay step sdcc tbc «vicr iBio a wish fluid, to load the dispensing device witb 

barriers elemeois are shallow and do not loterfeie with tbe *f * removing the wash fluid pnor lo loading 

iJeieciioD aacp. thereby grcjUy in^-r^.K.-ing ujovcnicDtx. dispeasiag devioe wiih i iresh aeiccied rcageoL 

Hykeq CorporjiioD has iie«ribcU « methuti uf miking an dttP««io« in the apparatus miy be one of a 

"amy of arrays" on a non-porous solid support for use with P^^^^ devices which are earned on tbe arm for 

their scqueocing by hybridization technique. The method dispensing different anaiyxe assay reagents ai selected 

described by Hyseq involves modifying the cbemiary of the poiiuons. 

solid support material lo form a bydropbobic grid pattern ^° another aspea. the mventioo includes a substrate witb 

where each subdivided region cnnuins a aucioarray of » surface having a miaoanay of at least 10* disimct poly- 

bioffiolecules. Hyseq 's flat bydropbobic panem doe* not nucleotide or polypeptide biopolymcrs in a surface area of 

make u« of physical bWickif^g ms an additional mcam ^. ^ ^ ™*- Ejch disiinu biopolyroer (i) i^ 

preventing cross conuminaiion. disposed at a separate, defined position in said array (ii) has 

SUMMARY OF THE INVENTION ^6^^ ^illTr^T''^^^^ " " • 

_ defined amount bei^xen about 0.1 femiomolcs and 100 

Tbe invention includes, m one aspea, a method of form- nanomoles. 

wbeit e»eb rtpoo u) Ibe may bu . knows mouot of » « eotted with . polyeibome polyaet. such « tolvlwi^aS 

«l««l.«.ly,..p«ific «.g.m.Tte«^^^^^ .b. biopoly».„ .re p'^lj'Toc H, de.'^ '^.To'.S^ 

re.g.Di^p««iag dev.ce beviag .d tloDgtte cpdJwy btddng. . w.,e,-per««bU film fomcd oo tbe b.ckii« Jd 

cb»n»el 0) formed by «p»ced..p»n. coexunax, eloogaie . grid fomed oo U>e film. "Pie grid ii wmposcTofa^ 

»o utiOD «.d (lu) b.v,og . up regioo u which ^ueous baking lo posiiioos r»i«d .bove tbc wrf.ct^ «id film 

solution ID the cb«i«I form, . mem^o.*. Tbe ch«a.l is .od p«titio» tbe film u,to . plu^h.y of wfu, im«14o« 

pitfenblj- formed by . pui of ipaced-^in upered ek- cells. A biopolyiDcr my is formed wiihio eJh^U^^ 

. - J • • J . . '«5 fnoTt gcneral}>', there is provided a substrau for uic in 

Tbe np of tbe d«pensag device « upped .g«« . >oM detecting binding of libeled polynucko.^6 one or^" 

suppon ... defined P«'"?o o» tbe »uppon «ri.ce *,tb « of . ph«li,>- diffe,«.-.«,ini^ iomobiliied^Wle 

rnpube effeajve to br»k the meniscus a tbe c.pUl«y ot«l.*. Tbe ,«b»«,e indues, in «e «pecuT3 JTi^n. 

I(«^Tbe,wosteps..ere^.t«,u„.fllbedesi,^.,„yis I'^^T^y'^^^l^^^l^^^^r^^ 

■ , ^ ■ , , . co.ting. wberr eub diainn biopolymer u dissosed at a 

Tbe melbo.1 may be pr.auxj id fanning a plurality uf separate, .lelioctl poauoo in a «,rf.« .nay ofpolynucle- 

sucb arrays, where tbe aohaioo-dcpositing step is applied to otide». 

a selected posiiioD on each of a plurUitv of solid supDoits at i.. ._„k., . .u ^ 

each repeatcvcle ' « In another upecl, tbc substrate includes a w.ier- 

_ ' ■ . "nP'nne'ble backing, I water-permeable film formed OD the 

-Tte dispeasmg dev« may be loaded with a .oh..u«. backing, wd a grid formed on .be film, where tbe md^ 

bytbestepsofCOdtpprng .becpdlary ch«.oel of the device axT,p™ed of mtersccung wa.cr-.mpcv^. grid ele^ n" 

m a w.5b sohiuon, u) njmovmg w«b K,lnt»n drawn mto cBcndmg from tbc backing poj.ions r.iid above tbc 

.ntn the new reagent anhiinn. unv is formed within each ceU °PO'ytner 

Aho included in the invention is an .utnn,.«d ,ppa™n« Also forming pan of tbe inveouon is , method of deiea- 

for fonmng a micomay of .nalyte^y «pons on a ing differemi.l cl^rc^ion of each of a v^^,^^oT^^°r, 

plurahty of «lid «.ppo,«, where ..e*. «poa an .be .my . fa,, ceU type, with respect to expression of tlL^ l2e7eo^ 

h»ak«,wn.mottarf.«dec.e4^^^^ „ » » «^cond «U type. iTp^ctic^ ,bc methoT.^i! fi^ 

Tlie .pp«.tm ba. . bolder for bolAag. ., known postuotn. p,„.,uc.,l Ouor»cxn..l.bde.l cDNA. frum mRN/1 i^U.Z^ 

_ _ ... second oeU l>-pe» aie labeled untb fiiu and second differtn. 

T>»e apparanis further includes a posiuomng structure for fluoresant teporiere 

portioning tbe dispensing device .t a selected .ri.y position k- a mature of the labeled cDNAs from the two ceU .vr« 

H,th respect to a s»pp«« m «d holder^ „d a d^pensing i, .dded to an array of pclynucleoudcs ^^ouT^ 

stn«ure for monng the dBpensing dev^ ,„„ upping ph.r.b.y of k«,wn genes derived from U,e J^^ll t?L 

engagement aga.r« a Nippon w.d, a «ele«ed .oTnilac effec- „oder coodi.ions tbaf result in b^*ridiza,ion of^ToN^ 

nvc to deposit a selected vo nmc on tbc snooon c s a n«.»«u».... , > , "i luc cL>nAi lo 

selected vo'^e m the volume range O.Ol T .ll « polyouctotidcs m tbe array. "H^ 

_ ... ... . ' ' »ww ai. t5 amy a tbco ci.mmcd by fluorcsccDct under Buorcsccnoc 

Tbe posiuomng and dispensmg "mcture* .le controlled ean.ation «odiuons in which (.") polymicleoddes in ib^ 

by a cootrol »ni. m the apparatus. Th^ u»i, oper,« to (0 .nay that .re bybrirtiied predominaLuy m cDNAs derived 
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from ooe of tbe first or secood ctU types give a disunct first 
or second fluoresctocc emissioo cobr, rtsprfliveiy, iDd (ii) 
poWoucleoiidcs in tbe irray thu an b>tridizsd lo subaao- 
Dal}>* equal Dumber^ of cDNAs derived fi^m tbc fiat aad 
sccood cell types five a disiioct combiocd fiuoresceoce 
emissioo color, respeaively. Tbc itlaiA'e exprcsfico of 
known gcoes io tbe two cell types can tbeo be dctenmocd by 
tbe observed fluorescence emission color of eact ipoL 

Tttsc and other ob^cis and tcarurcs of tbe inveoiioo wOl 
become more fully apparent wbeo the following detailed 
doiCTiplitm of the invcniitm is read in tamjunuitm wiih the 
accompanying figures. 

Tbe file of this paunt contains at least ooe dewing 
ciecuted io color. Copies of this patent with color cowing 
(s) will be provided by tbe Patent and Trademari Office 
upon request and payment of tbe necessary fee. 

BKJth DtSCRimON OF JKt DRAWINGS 

FIG. 1 is a side view of a reagcnl*dispeQsing device 
having a opeo-<apillar>* dispensing bead constructed for use 
in ooe embodiment of tbe invention; 

RCiS. 2A-2C niustraic sxzps in the dcliveo' "f » fixed- 
volume bead on a bydropbobic surface employing the dis- 
pensing bead from FIG. 1, in aoooidancc with ooe embodi* 
mem of the method of the invention; 

FIG. 3 shows a portion of a rwo-dimensional array of 
aiulyie-assay regions consiructed according to tbe method 
of tbe invention; 

FIG. 4 is a planar view showing components of an 
automated apparatus fc»r forming arrays in accordance with 
tbe ioveniion. 

FIG. 5 shows a fluorescent image of an aaual 20x20 array 
of 400 fiuorescently-labeled DNA samples immobilized oo 
a poly-l-lysioe coated slide, where tbe total area covered by 
the 400 element amy ts 26 square millimeters; 

FIG. 6 is a fluorescent image of a 1.8 cmxl.S cm 
miooarray containing lambda clones with yeast insms, tbe 
fiuorcsccot signal arising from the hybridization to tbe array 
with approximately half tbc yeast genome labeled with a 
green fluoropbore and tbe other half with a red fiuoropbore; 

FIG. 7 shows tbe translaboo of tbe hybridization image of 
FIG. 6 into a karyotype of tbc yeast genome, where the 
elements of FIG. 6 microarray contain yeast DNA sequences 
that have been previous!)' physical))' mapped in the yeast 
genonx; 

FIG. S shows a fluorescent image of a OS cmarOJ cm 
miouarray of 24 cDNA eluock. where the miouarray was 
hybridized simuluncously with total cDNA from wild type 
ArabidopsB plant labeled with a green fluoropbore and total 
cDNA from a traosgcoic Arabidopsis plant labeled with a 
red fluoropbore. and tbe arrow poinu to tbe cDNA clone 
represenung tbe gene introduced into the transgenic Arabi- 
dopsis plant; 

FIG. 9 sbows a plan view of subarate having ao amy of 
cells ftsnoal by barrier clemcou io tbe form of a gritl; 

FIG. 10 shows ao enlarged plan view of one of the ocUs 
in tbe subsuatc in FIG. 9. showing an array of polynucle- 
otide regions in the cell; 

hlG. 11 is ao enlarged sectional view of the substrate in 
FIG. 9, taken ak>ng a acction line io that figure; and 

FIG. 12 is a scanned image of a 3 cmx3 cm niuoceUulose 
solid suppon coouining four identical arrays of M13 clones 
io each of four quadrants, wbcre each quadrant was hybrid- 
ized simultaneously to a different oligonucleotide using an 
open face hybridization method. 



JC 



}5 



25 



45 



55 



6U 



65 



UhTAlUO) DtS>CKiKnUN OK J-HH 
INVENTION 

I. Defioitiuis 

Uoktt indicated otherwise, tbc terms defined below have 
tbe following meanings: 

"Tjgand" refer* to one member of a ligand/anii-ligand 
binding pair. Tbe ligand may be. for example, ooe of tbe 
nucleic acid arands io a complemcntar>'. hybridized nucleic 
add duplex binding pair, an effcaor molecule io ao effeaor/ 
receptor bioding pair, or an antigen in an antigcn/astibodv c; 
amigcn'antibody fragment bioding pair. 

"Aou-iigaod" refers to tbe opposite member of a ligaod' 
ami-ligand binding pair. Tbe anti-ligaod may be the other of 
the rtudeic add KUantK in a umiplcmcniary. hyhridi/rd 
nucleic acid duplex binding pair, the receptor molecule in an 
effeacr/receptof binding pair, or an anubody or anubody 
fragment molecule in amigen/antibody or antigeo^antibody 
fragment binding pair. rc^ai%'ely. 

"Aoalyte" or "analyie molecule* refers to a molecule, 
typically a macromolecule. such as a polynucleotide or 
polypeptide, whose presence, amount, and/or identity are to 
he dciermincd- The analyic is unz member of « iigarxl/anti- 
ligand pair. 

"Analyie -specific assay reagent** refers to a molecule 
effective to biod spcdficaUy to an aoalyte molecule. 'Ibe 
reagent is tbe opposite member of a ligand/anii-Iigand 
binding pair. 

An ''amy uf regiuns uo a M>hd wippurt" is. a liocir ur 
two-<iimcnaional amy of preferably discrete regions, each 
having a finiie area, formed on the surface of a solid suppon. 

A'^axamy is an amy of regions having a density of 
disoeie regioos of at leasi about 100/cm% and preferablv at 
least about lOOa^cm^, Tbc regions in a microamy have 
typical dimeoaoos, e.g., diameters, in tbc range of between 
about 10-250 /an, and are separated from other regions io 
the array by about the same distance. 

A support .^rfacc i\ "hydmphobic" if a aqucmis-mcdium 
droplet applied to tbe surface docs oot spread oui subauo- 
tially beyood tbe area size of the applied droplet That is. ibe 
surface acs to preveoi spreading of tbe droplet appbcd to tbe 
surface by bydropbobic inicraciion with tbc drop leu 

A "meniscus" means a concave or convex surface that 
fomu on tbe bottom of a liquid in a cbaoiKl as a result of tbc 
surface tension of tbe bquid. 

"Distinct biopolymers", as applied to the biopolymers 
forming a microanay. means ao amy member wfaich b 
distinct from other array members oo the basis of a different 
biopolymcr sequence, aod/or different cooccmra lions of tbc 
.same or dlsiina binpolymcrs, and'nr diffcrcm mixrurer^ of 
distinct or diffcrcni-cooccoiraiion biopolymers. Thus an 
array of "disiioci polynucleotides" means ao array 
containing, as iu membcri, (i) disiina polynudcotidts. 
which may have a defined amount io each member, (ii) 
differeot. graded cooceotratioos of giveo-scqucncc 
polynucleotides, and/or (iii) different-composiiioo mixnirts 
of two or more distioct polynucleotides. 

"Cell type" mcaos a ceD from a given source, e.g., a 
tissue, or organ, or a cell in a giveo sute of differeouaiioo, 
or a cell associaied with a given pathology or genetic 
makeup. 

II. Method of Microarray Formation 

This scctioQ describes a method of forming a microanay 
of anatytc-assay regions oo a solid suppon or subyraic, 
where each region in the array has a knowo amount of a 
selected, analyte-^)ed6c reagent. 
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HU. 1 ilJusm.es, ffl » pinially Kheoaiic view, a >od ,w,y frem tte subsnate surf^t B.kii.» nB««u,v 

devia geoe«l)y ifldudw i rugeoi dispoae, U bivin. u dispeascr ui^t ^= «b« np of ibe 

ck.og.u opco c.pill.0 cWl 14 bold -STOK^ Hg^L.T'^r^^.iS'ic 

Dn-oftbeit.een'»»W«)n.«d)«iadieiied.il«.«will s acm^atio\ht^iDCh*l^^^^^ -7^^ 

be c.e«.-b«. below, l.e «pill.o- cb.nod i, ,om^ by . ^r^e'C^iS^cf^ X'S^'l 

put of sp.e«d.«p»n. coesicDsve, eion|iu Dcmben Ua. flowint of ibe bouid iao tbe eiBill.«»7J^; J^**.* 

Ub wbieb .re upered iow«J o»e uotbe, ud coovej^e .1 «7S wS^^nt u.^w 

. lip or lip >egioD 18 ti tbe lomxr eod of ibc ebusel. More cbuinel. m«?o iTAo 2B 

geoeraUy. ibe open cfaaooel is fomed by «i lets two « rg jr ck_« w a. i . 

ebngtte. mced-tptn memben idtpted to bold t qutntiiv JrZ^ • ™° wppon 

of ««geDr ioluiioo* »d taving .up «gioD .i wbicb ^*iu!^!f.f * hnbopbobie surf.ct "n^ 

.queiHi> ».lulii.n in the ch.nnel r..rn« , men«uis. »ch »i 7^ iUuaniM ibil kpud cominuc to flow from ibe 

tbe cooenx fseniscus illustnicd .1 20 in HG. 2A. TT»c h^^/"" " » 

[^^^.^^"^--^'^'^'^''ot^'^^ « «^o^»^Si-'^'rb?«S'b7t^' 

U^tb condoled .fe«». » nC. 1. tbe dispenser device J^^J^^^L" SS^it^^lltJ^ "T" 

.to ^eludes «™a«re for n««.xng tt,e d^r r.pidly .odbyU.e«,,fKr .ens^nofibe^i^SS'.Ss^'^^i 

iow.nl .nd .w.y from . .upper. 5urf«, to efteaiog , bc«J ew^•.mrc. At thi. poiTu. 

depaaooDof.kaowBUDOoaiofsolBiioouiUKdapeQseroo » wiU hive fei™ri .nri , ^'""^ 
.«.pror,«wiUberie.aibedbelowwiU.,t£ereBc^onGS. 

2A-2C m Ibc cn.bodin,cn. .town. U.i. indudc . ^ Slf^oo efef^d ^L'T' "'"''""^ 

Mieoo;d 22 wtucb .cm-.i.ble to dr.w t soleooid piBoo 24 c r -jj- 

r»pidly downwirdly, then leleue ibe pinon. e.g uader t- i "Vn'>-<wpcnBng on * mote bydrophflic .urf.ce, ibe 

spring bi.4. IO • norni.1. r.ised posiiioa « shown. The a ''^.^V' « • '"«leocy lo be.d. .od tbe di^Ds^ 

dispenser is earned on ihe pision bv . connecting member ?"»™* °»»« Knsiiive lo .be loul dweU time of tbe 

26. u sbown. Tbe just-descrfted moving stncnirc is .Iso «»P » unmedi.tt viciBii>- of tbe suppon 

referred ii> hercm .n dij^wwing mans for miiving the P°*"»» iUustt.led in RGS. 2B .od 2C. 

dispenser into cng.gemeni wiib . wlid wipport, for dispeas- desired ilcrwium vulumc, Le.. buil vulume. fumieJ 
ing a known volume of fluid on tbe nipport. x ^ metbod is prefenbiy in .be nsgc 2 pi (picoliiers) to 

Ibe dispensing device jusi described is cwied on an am (wnolittts), Ulbougb volumes k bigb u 100 nl or more 

2S ib.l ro.y be nxjved eitber linearly or io u x-y pluie lo " tppreciiied ibai tbe selected 

positioD tbe dispenser u . Kkcted depooitioo posiuon. as J*'*'"" 'l^'"™* «*=Pe«»«l on (0 tbe -footprint- of tbe 

wiD be described. Iflf ^' ^^-^'^^ 're* spianed by tbe tip. (ii) 

HGS. 2A-2C iUustratc tbe mclbod of depositing . known •?PP«f»"rf««. »««> OS) tbe time 

.mo»nlofre.gem.oh,tiooi«tbeju«^teXjS^^^ ™n 17:2 *1 ^JlV "^l'^"" 

.be surf .ce of . sobd support, sueb » tbe «;ppon i^itS J^^Tvii^ 

..30.-n,e«.pponb.pol>T»er.gUss,oro.b?^d-m.uS Se^^^oTSS^'t^'.'"""'''^'^'"''' '^"^ 

suppon baving . surf.a tndieaied .1 31. Zf.T-?^A ■ ^ ^ dispenser onto ibe nppon 

sucb ».rf.ce desaibed below i, . gJas, wrfaiTb^ ,n „nWW S ^ ''«*P««' -^P » •»PP«J 

.b«»bed Uyer of . polyciioaic polymer, sud. «^y" "T I ^'T '"'^«' ^"l* • ""'l "silence 

lysine *^ u»e in cotii.e.w,tb tbe suppon of less tbw. bout 1 msec. 

1 ^ . .t«J « rate of upw.nJ tf.vel from tbe surfKc of .bom 10 

la anoiber embodjoeot, tbe nirf.ct bu or b foioed to cta/sec. 

hive . reUiK-ely hydmphnliic duraeter. ix, one that eau^es AMumin? thii h,.^ .h., f« 

.queous medium deposited on tbe «rf« to bead. Av.ricty su^^.^tL^i.'!^.? , h T on coottci witb tbe 

of k»o«x bydropbobie pohTDeis. mcb as v>hmvT^J . * ,'*'"«P'«."«=»' w.ih . diameter approi.. 

polypropylenl orW-b^^' lir^^SirS b^K nG^^jrii^'vlmT^? t"": "T^'^'^ " » 

properties, as do glass and a variety of lubricaoTw^ oibcr iiLnZ^ /nt fo«aed m rtUuon to 

b^bobic Alms U..I may be appL u, U« «ppon i^^o^LTJT,^^.:^.^^^^^^^ 

t-^,^^ . , ^ . . . " iBcrttsed from aboui 20 lo 200 urn. 

ImuilJy. Ibc dispenser u loaded wxii a lelected a&alyte. 35 

aij*ciIiL* mgcni ituluiiun, Micii by dipping ibc tlivpcmcr TABLE 

Dp, ificr washiat. into a iciutioo of ibc rragcot, aod i-t 1 
aliowiag fiUiog by capillary ficrw into tbe dispeoser cbaxuicl. 
Tbe dispenser is now moved to a scleaed poiiiioB uiih 
respect to a suppon surftce. placing tbe dispctser lip 
duecUy above tbe support-surface posiiion at wbicb Ibe 
reagcot is 10 be deposited. Tbis movemeot laices place %itb 

Ihe dc^xa^vr lip in iu ra ixd pmitinn. aA xcn in RCi. 2 A, 

wbcTC the lip is lypicaUy at lc4si several 1-5 idzb above tbe* ai . • l ^ 

surface of tbe substrate ^ ^ „ A* * 8^^° *f • «o be reduced io a 

uxth th, rii«»n»r «n ««iw»»,,< .i ■ cootfoUcd fasbioo by locrcaang surf.ce bydropkabidty 

With tbe «l»«P«»e' » positiooed. solenoid 22 u now rtdudag tise of esotaci of tbe bp witb tbe «rf «x ^„ 

.an^ated u, cause .be d«pens« up ,o move rapidly .ow.nJ i»g r.te of movemen. of tbe tij Tw.j ITm ^e* 
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3Dd>or iscrcasing tbc N-isoosity of tbc luediua. Once these Soleooid 76 is voder the coaiDl of i cootrol unit T7 wtnse 

ptruD tiers are &2ed, a selected dcpositioo volume id the openDoo wiU be drsaibed below. The soleooid b also 

desired pi to nl range can be achieved io a repcauble referred to bereio as dispcosicg means for ooviag tbe device 

fashioa. iato tapping cngagcmcoi with a suppcn, wfacs the device is 

After dcpoftitiog a bead at coc adcocd bcatioo on a < pcntioocd at a de^oed amy pcasiuoo V'-iih rcspea to thai 

support, the up is lypialW moved to a coirespoo^Bg support. 

posiiioD on a secood support, t droplet is deposiied ai that The diyrnsfr device is earned oo ao arm 14 u-hicb is 

position, aad this process is repeated smi} a liquid droplet of threadcdly motmted oo i «-orm screw 80 driven (routed) io 

the rtacent has beeo deposiied ai a selected position on each a deared dircann by i stepper nx>tor tQ also under the 

of a plurality of supporu. ic cootrol of unit T7. At its left cod in the figure scrrw SO is 

Tbs tip is the a washed to recscvt the reagent liquid, filled cotcJ in ■ siecvc A4 fur niuiiiwi ahcnit the strew i»Ln. Ai 

with another reageni liquid and this reagent is now deposited its other end, the screw is mounted to the drive shah of the 

at eacb another amy position on each of the supporu. in one stepper motor, which in turn is carried oo a sleeve 86. The 

embodiment, the lip is washed and refilkd by the steps of (i") dispenser device, worm socw, the two sleeves mounting tbc 

dipping the capillary channel of the device io i with worm screw, u)d the stepper motor used in moving the 

solution, (ii) removing wash aohitioo drawn into the capil- device in the "x^ (bohzontal) direction m the figure form 

lary chaxmei, aod (iii) dipping the capillary channel into the ^st is referred to here collcaively as a displacement 

new reagent sohtiioc. assembly W. 

From the foregoing, it wiU be appreciated that the Use ilnspliccmcni ss^mbly is cuiKiruacil lu prixlucc 

rwecren^Iike. npen<apnLary di^penKcr trp prrrvideK the preoac, micr^ruge movxmcnt in tbc direction of tbc socw, 

advABtagcs that (i) tbc open charmcl of the t^ faciUutcs i-e., along ao x axis in the figure. In ooe mode, the assembly 

rapid, eScieni washing and drying before reloading the tip funaksns to move the dispenser in x-azis iocrements baviog 

wiih a new reageot. (ti) passive capiQary aaion can load the a selected distance in the range 5*25 fm. In ar>other mode, 

sample directly from a standard micro^xll plate while the dispenser unit may be mo\'cd in predae x-axis iocre- 

retaining sufficient sample is the open capilbry reservoir for ^ cseias of several microns or more, for positioning the 

the printing of oim)erous arrays, (iii) open capillaries arc less dixpcmcr at awiciaicd pmitinm on adjacent )Qippon.\, a5 

prone to doggmg than closed capillaries, and (iv) open will be described below. 

capillaries do not require a perfectly faced bottom surface The di^lacemcni assembly, in mm, b mounted for move- 

for fluid delivery. ^ ment in the "y* (vcnical) axis of the figure, for positioning 

A portion of a microarray 36 formed on the surface 38 of ' the dispenser at a seleaed y axis position. The sirucmre 

t ttuIiJ kuppun 40 in auu>nUncc with the mciboJ }\nx moimting tbe assembly includes a fixed rod 88 mouotcd 

described is shown in FIG. 3. Tbc array is formed of a rigidly between a pair of frame bars 90, 92. and a worm 

plurality of aoalyte-cpecific reagent regions, such as regions screw 94 mounted for roiation bcrtveeo a pair of frame bars 

42. where eacb region may inchide a different analyte- 96, 98. The ^nrm 5crcw \%. driven (muted) by a .Mcpper 

specific reagent As indicated above, the diameter of each * motor 300 which operates under tbc control of unit 77. Tbc 

region is preferably between about 20-2UU^. Jhe apadng motor is mouotcd on bir 96, as shown, 

beiwceneach re^oD aod its closest (non-diagonal) neighbor. The struaure just described, including worm screw 94 

measured from center-irxxnter (indicated at 44), U prefer- atjd motor 100. is construaed to produce prrcisc, micro- 

»bly in the range of about 20-iOOAcn.Thus, for example, as ^ range movcmeni in the direction nf the screw, ix., along a 

array having a center-to^mer spacing of about 250 ^ y axis in tbe figure. As ibove, the structure hioaions in one 

contains about 40 regions/cm or 1,^00 rcgions^'cm". After mode to move the dispenser m y.ixis incremeDU having a 

formation of the array, tbe suppon is uxaied to evaporate the selected diujncc in the range .^250 /on. «oc1 io i kccuml 

liquid of the droplet forming each region, to leave a desired mode, to move the dispenser in precise y-axb increments of 

array of dried, relatively flat regions. This drying may be several microns ijm) or more, for positiontng tbc dispenser 

dooe by beating or under vacuum. it associated positions on adjacent supports. 

in some cases, it is desired to first refaydrate tbc droplets The displacement assembly and struaure for moving this 

contuning tbe analyie reagents to allow for more lime for assembly in tbe y axis ire referred to herein coUectKely as 

adsorption to the solid suppon. It is also possible to spot out positioning means for positioning tbe dispeosine device at a 

the analyu reagents in a humid envimnmenl vi that dmplcL^ ^ selected amy position with respect to a support, 

do not dry until the arnying operation is complete. A bolder 102 in the ipparitus functions lo hold a plurality 

in. Automated Apparatus for Forming Ainys of suppnav. ^uch a.^ ^uppori^ IfM on which il« micmarray^ 

in toother aspect, the ioventioo includes an automated of reagent regions are to be formed by tbe apparatus. Tbe 

apparatus for forming an anay of analyu-assay regions on holder provides a number of recessed slots, such as slot 106, 

a solid suppon. where each region in the array has a known 55 which receive the supports, aod poiitioo them at precise 

amount of a selected, analyte -specific reagent selected positioos with respect to the frame bars on which 

Tbe apparatus is sbown io plaiut, and partially schematic the dispenser moving means is mounted, 

view in FIG- 4. A dispenser device 73 in tbe apparatus has As noted above, the control unit in tbe device functions to 

the basic coostruaion described above with respect to FIG. aauaie the two stepper moiors aixj dispenser solenoid in a 

1. and includes a dispenser 74 having an opcn<apiUary sequence designed for automated operation of the apparatus 

chani^cl lerminatmg at a tip. substantially as shown in FIGS. m forming a selected microanay of reagent regions 00 eacb 

1 and 2A-2C. of a plurality of supports. 

Tbe dispenser is motmted in the device for movement Tbe control unit is constructed, according to ooovcniional 

toward and away from a dispensing position at which the tip microprocessor control principles, to provide appropriate 

of the dispenser ups a support surface, to dispense a selected C5 signals to each of the solenoid and each of the stepper 

volume of reagent solution, as described above. This move- motors, in a given timed sequence and for appropriate 

tneni is effected by 1 solerxdid 76 as described above. signalling time. Tbe coostruaion of the unit, and tbe settings 
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piiicro. wiU be DDricrooori from tbc foUnwing dcscr^uoo of m FIG. U. wtkh is so cnUrecri scaiona] vkw i^Vioni 

• typical apparatus opcraiioc vkw Iixk D4 m RG. 1 Tbc substrate achidc* . wicm 

laiiially, ooe or more wpporu are placed to ooc or more xmpcriDcable baddi^ 126. suet, as a class liibc or riaid 

slott ifl tbc bolder. Tbc dispenser u tbeo iDoved to a positioo 5 polymer sbcct. Fonned 00 ibc surface of the backine Z a 

directly above a wcU (not abowo) coouizxias « aoluiioD of waier-penDcabic film US. Tbe fiim is foriDed ofTporous 

U»e first leagcot to be dtspeased oo the mppon(s). Tht memhrane material as n.tma«uin« mcmhranrnr a 

dispenser soicooid IS aauaied DOW 10 lower tbc dispenser tip porous web material, suet as a nvion, pofvproDvlcne or 

into this weU, causing ibe capillary cbanoel io ibe dispenser PVDF porous polvmcr maicriaL Ttt ihirVf^ of tbc fili is 

to fill. Motors «2. 100 are no*- aouaied to positioD tbe preferably between about 10 and 3000 >fla. Tbc tiim mav be 

dispenser at . selected amy position at tbe first of tbe applied to tbe backioR by spravmg or coanne uncurcd 

>uppuru. Sulenuul aauaixiD ui l« th.pca«r is tben cITa;. material on tbc b.ckint or by api,lyi^Dg a preformed mem- 

ove to di^nsc » selcaed.vohime droplet of that leageot at brmae to tbe backing. Tbe bactanc and film mav be obum^ 

this location. As noted abox-e, ihU operation is effective to as a preformed unit fitao commercial source cx a olaT^ 

dispense a selected volume prefcrabl>-betwben 2 pi and 2 nl backed nitroceBuloGe fihs available from ScSeichlr .nH 

of tbe reagent solution. " Sciuell Corporaixjn. ana 

Tbe dispenser is oow moved to tbc corre^oding position ^itb contisued reference to RG. U, tbe film-covered 

at an adjacent support and a similar volume of ibc solution wfaee in tbc snbstraie is partitioned into a desired amy of 

is dispensed at tbis position. Tbe process is repeated until tbe cells by waicr-impermeablc grid lines, sucb as lints 130, 

reagent bas been dispensed at this preselecied corresponding ^» ;*'bicb bavt infiltrated tbe film down to tbe level of tbe 
pmitinn on each nf the fluppna\. ^ b » c^\r\ ^ and exteod above tbe surface of tbe film as shown 

Where it r desired in di.^me a single reagent at more Wically a disunce of 100 to 2000 /an above ine film 

than two anay positions 00 1 support, tbc dispenser may be ^orUct. 

moved to different array positioiu at eacb suppon, before formed cm the .MiKsiraie by Uying dirwn 

moving the dispenser 10 a new suppon, or sohitkin can be " wJOired or otherwise flowablc resin or elastomer solutioo 

dispensed at individual positions on each suppon. at one ^ ^ *o Fid, allowing tbc material to infiltrate the porous 

selected position, then the cycle repeated for eacb new amy ^ down to ibe backing, tben curing or otherwise harcfco- 

position. ^ lioo to form the cell-«rTay suhairaie. 

To dispense the next reagenu ihe dispenser is positioned ^ prefen^ed maierial for the grid is a flowable silicone 

over a wash sohition (not shownX and tbe dispenser tip is ■^**>1« frwn Lociiu Corporation. Tbe barrier material can 
dipped ID and out of this sohition until the reagent sohition ^ ea«dcd through a narrow syringe (e.g.. 22 gauge) using 

has been substantially washed from the tip. Solution can be ^ preMure or mechanical pressure. Tbc syringe is moved 

removed Irxim the lip, sfier eadi dipping, by vituum, ^^1^ ^ ^' wppon 10 print the bamcr elements as 

compressed air ^ngc. or the like, * * ^ pstiem. The cxuurierf bead of sUioDnc wicks imo tbe 

The dispenser lip b now dipped in a second reagent well. !f*^ «PPon and cures to form a sbaUow 

and the filled is moved to a second seleaed amy position ''*»«'Proof bamer sepanung tbe regions of tbe soUd sup- 

in the first suppon. Tbe process of dispensing reagent at each ^T" » 

of the corresponding sccood-amy positions is then carried « w»"muvc embodiments, tbe barrier element can be a 

out as above. This process is repeated until an entire mturul or a themooset material sucb as cpoxv. 

micToamy of reagent solutions on each of the supports has . ^^-cm^ polymer w^ct 
been formed ^ -n eipiKol u, UV light .fur being primed onto the M.lid 

rv. Miaoamy Subarate »uppoa Tbe bamcr material msy also be appbcd to the solid 

a microamy of biological polymers earned 00 Ihe substrate iKe ^\;a . ^ »: u » * ocai-se*j siampmg ol 

«H or wbu* ««u« . o«n«„., .Dd p,crcr.W, .n .bo be . ,b»llow grid whicb u or oTber^ 

■deouca] iDietD>iT*y. of disuDci biopoiymcn. ucb as di*- ,dbei«J to ibe »ljd iuppon °' otberwae 

.o«rimg u, «"v«mM. -r^ «b«x.|e b« » 8x12 embodiment, .be b^e, eleven, m.y ,lso be Ld o "d^^,, 

rccunjuU, «t.y m of 5ucb « «n» 114. U«. fon»ed ,be poraui memb,».,c .o . DOD-portn^ backio^T.^ufot ^ 

sucbuctllU4.u)nira«ipporu»micioimylJgofdi«ioct uuy reigcnu. »»"oi»niioiDoootiDe 

kDown. .dd«s*bk ngions of U>e B.«om.y. T«o sucb . ooD-porou, m..eri.l. Tbe b.mer c«, beTrio.^ e^c 

rtg-n, forauDg U>. i»KTO«r.y «e UKhaied .. 120. .od before or if .er .be m.cn,.rT.y of b^n^Tecv^eTn^o.e?^ 

coneipond to tepom. tucb u regioiB 42. forming ibe au ibe solid suppon »n>oiecu.es a pnoied oo 

^''^^^-'''r^ " ^ "° ^ .pprccied. Ibe crlh formed by U>c pid lines 

Ibe 'XHoell my shown in HO. 9 .ypcUy bu may »nd tbe underlying b.cking .re w.ter-miperme^rh.v^ 

d^^ons b.,v«„ .jbom 12 «d 244 mm ^ widU. u.d 8 su5e b.rrien pl^Ln,, .t4veVJ,rr 6^ t t^'^cT 

.„d 400 mm m length. w.tb tbe cdU, u> tbe vny b.ving Jhr^ defined-volume samples c.n^ pUced ii e^b weU 

widthMdJcagO, dmwmon of «d Vi tbc «,.y width .nd « wiihoui risk of cn,ss<onuim.iion with «mpk m.^^^n 

tengtb dimensjoat^r^Dvely ,.e between .bo«t 1 and 20 .d^.cen. eells. In nc. U. defined volumes X»« 2 « 

m wultb and 1 and 50 mm in length. s.BpU 134, ate sbo»i, ii ibe odls 
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As ooicd abo\*c, eacb coouins a miooarray of 'Jb form ibe mKroarTay. defioed voluocs of ditUDct 

riisiina biopolymen. io one gcocril cmbodimem. tbe polvDuckoiides ire dcposued oo ibe pohoDcr^taied slide 

miootiray* iD tbc wcU arc idcmicil anayj of disuDCt u doaixd in Seaioo D. Acrording to w impofiMl fearurc* 

biopolymcrv t.g^ diffeitai sequence polynudeotidei. Sucb of tbc subaaic, tbc dcposiicd polvouclcoiidcs rcmaiu botrnd 

•rrayi od he fonoed u acconiaDcc witb tbe metbods 5 to tbe auicd siide lurfjce ooixovalenUv wben an iqueoui 

dcacrft)cd io SeaioD D. by depoftiiiat a first leieocd poJy- DNA wmple is applied to the nbsrau oDrier cDoditiam 

Dudcoude ai ibe same cekded naooamy pontioa io eact which aUow bytrndiiaiioD of rrponcr-Iabckd polyoude- 

of tbc cells, ibeo dcposiung a secood polynucleotide at a oudes 10 ibe aample to cotnpiemeDUry-KQueoce (unrie- 

different microarriy posiuoo m eact well aod ao on until a sirandcd) polynuclcoiide* in tbe subnrat'e array. Tbe mcibod 

compleie, ideoiical microamy is t'ortocd is each cell jc q iUustraicd in Examples 1 aod Z 

In a preferred embodiiocou each microarray conuins To illusinie tba feature, a substrate of Ibe type iusi 

ihiHii m dwinci p<>lynuL*fc«»iidc cir po]>TKpiiuc bicipi)iy- deacribed, but having an array of aame-aequcDce 

mes per surface area of las than about 1 cm^ Also in a polynucle^uidcN. wa.x miaed with nuorrsceni. labeled 

preferred embodimeni, tbe biopolynen id each miooamy c«plcmcniary DNA under bvbridizaiion coodiiiom After 

rcgioD are prtscnitn a defined arooum berween about 0.1 53 waahing to remove ooo^ytnidiz^ material the subatrate 

fcmtomolcs and 100 naoomolei. Tbc abiLty to form higb- waa examined by low-power fiuortsccna microscopy Tbc 

density amys of biopolymers where each region is formed amy can be ^Tsualizrd by the reUtivcW uniform UbeUne 

of a weU-dcfincd amount of deposited maierial, can be p^um of tbe anay regions 

achieved io accordance with tbc microaxray-forming method lo a oreferrtd embodiment, each microamy conuins at 

dcacnbcd m Scctjon 11. ^ ^- , » ^^^^ polyoucicoiidc or polypeptide biopolymca 

Also in a preferred cmboduneni, tbe biopolymers are per surface area of less iban about l^Tln tbe eS 

polymicleotirics having lengths of at least about 50 bp, le., abown 10 RG. 5. ibe miooamv comains 400 lerions in ao 

subsuniiaDy loogcr than obgonuclcoiidcs which cu be area iif about 16 mm', .y ^^x^CP reinimx^'cm' aUi in a 

formed in high^densiiy arrays by schemes invoKnng parallel, preferred embodiment, ibe polynucleotides io each miaoar- 

siej^w^e polymer synihcsu on tbe amy surface. 25 ray region are present in a defined amount between about 0.1 

In the case of a ^ PoJ)^«<i«t«J«. *7»y- m an assay femiomoles and 100 oanomoles in ibe case of polytmcle- 

procedure, a small volume of tbe labeled DNA probe nnx- otides. As above, ibc ability to form highsletaiiraiTa>^ of 

ture in a sundard b>^ndi2auon solution is loaded onto each this type, where each region is form^of a ^U^fined 

ixll. The ^i.luiwin will spreiJ Ki uiver the enure mimwny amount of deposited material on be achieved in acrordancr 

aod Slop at tbc bamcr elements. Tbc sobd support is then » with ibe microamy-formiog method described in Section U 

incubated in a humid chamber ai tbe appropriate temperarure Abo in a preferred embodiment, the polynucleotides have 

as required by tbe assay. of al least about 50 bp, i.e. subsuniially longer thao 

Each assay may be oooduaed m an "open-face formal obgonuclcoiidcs wbict an be formed in highwtensaiy amys 

where no funbcr seabng step u required, smce the hybrid. by various in aim synthesis schcmes. 

izaiioo solution will be kept properly hydra ted by the water 15 V. Utility 

vapor io tbe bumid cbamfcxr. Al ibe coodusioo of tbe Microamys of imroobiliied nucleic acid aequenccs pre. 

incubation step, the entire solid support containing tbe p^cd in accordance with tbc inveniion can be used for laj« 

numerous microamys is nnsed quickJy enough to dilute tbe *cale hybridiiatioo assays m numerous genetic appUcaUott, 

isiMv rcigeou «i tbai 00 significam cruwi cuniaminaiiuo including gcoeUc aod physical mapping of genomes, moni- 

occurs. Tbc entire solid support is then reacted with dciec lorin^ of gene exprc^inm. DNA .^ucncing genetic 

tion reagents if needed and analyzed using standard dUgnosis. genotypmj of organisms, and distribution of 

calorimctric, radioactive or fiuoresceni detection means. AB DNA reagenu to researchers. 

processing and deteaion steps are performed simuha- For gene mapping, a gcoc or a clooed DNA Eragmem is 

neously to all of the micioamys on tbe solid support bybridircd 10 an ordered anay of DNA fragments aod tbc 

ensuring uniform assay condnions for all of the micioamys 45 idemjty of the DNA elements applied to the anay is unam* 

un the solid support. biguously esxablisbed by the pixel or patiero of pixels of tbe 

B. Glass^lide Polynuclcoudc Amy amy that are deleted. One appbcaiioo of such arrays for 

FIG. 5 shows a subsirau 136 formed according to aooibcr creating a genetic map is described by Nelson, ei al (1993). 

aspect of the iovcmion, aod intended for use in detecting In construciing physical maps of tbe genome, amys of 

binding of labeled pdymjclcoiirtes 10 one or more of a so immobiliicd cloned DNA fragments are bybridiied with 

phirality disana polynucleotides. Tbe substrate iochidcs a other cbncd DNA fragmcois to csiablisb whether tbc cloned 

glass substrau 136 having formed on its surface, a coaung fragments in tbe probe mixiure overlap aod are tbe re fore 

of a polycaiionic polymer, preferably a cationic polypeptide, cootiguous 10 the immobili2ed clones on the amy. For 

sucb as polylyxioe or polyarginine. Formed on the pel yea- example. Lcbracb. ci al. describe such a process, 

tionic coating is a microarray 140 of distinct 35 Tbe amys of immobibzcd DNA fragments may also be 

polymickoiidct, eact V >rali7rd at known selected amy used for genetic diagnostics. To illustrate, an amy conuin- 

rcgiiirLS, Kith as rcjptm^ 142. ing multiple finms (if 1 mutated gene (»r gene* can he pnihcd 

7>ie slide is coated by placing a uniform-tbickneas film of with a labeled mixture of a patient's DNA which wCI 

a polycationic polymer, e.g^ poly-l-lysioe, on the surface of preferentially interact wiib only ooe of the immobilized 

a slide and drying tbe film to form a dried coating. Tbe bi* versioos of the gene. 

amount of polycationic polyTner added is suiSdeol to form Tbe detection of this interaction can lead to a medical 

at least a monolayer of polymers on tbe glass surface. J'be diagnosis. Arrays of immobaized DNA fragments cau abo 

polymer film is bound to surface via electrosutic binding be used in DNA probe diagnostics. For example, tbc identity 

between oegaiive silyt*OH groups on tbe surface aod of a pathogenic microorgaoism can be csublisbcd unam- 

charged amine groups in tbc polymery Poly.Mysinc coated C5 biguously by bybridisng a sample of ibc uokncpwn palbo* 

glass slides may be obtained commercially, e.g., from Sigma gen's DNA to an amy cooiaimng many types of known 

Cbemical Co. (Sl LouiSs Mo.). pathogenic DNA. A similar lechmque can also be us£d for 
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uiumbipjous geootyping of aoy orgaaiuD. Other moleoiies 
of gcDctic ioieresu such u cDNAs lod RNAs can be 
inuDobiloed on ibe trray or alterDaiely used u tbc labeled 
probe mizmrc tbii is applied to tbc anay. 

Id ooe applicauoo, ao amy of cDNA dooes represczumg s 
geoes ts byt>ridi2cd with total cDNA tnm ao oxganisci to 
monitor gcoe ezpiusioo for rcscarcb or diigoofttic purpooes. 
Labeiiog loul cDNA from a oonnal ocU witb ooe color 
fluoropfaore aod total cONA from a rtitr iwtI cell wiib 
aootber color fiuoropbore aod simultaoeously bybridinag k 
tbe two cDNA samples lo tbc %Mmz array of cDNA dooes 
siiows for diEtrtoual geae espr &s&ioa to be mtmttd as the 
ratio lit the two nutiniphorc inicnftiiicx. This iwiHuikir 
czpcrimcot can be used to monitor gene czprcsiion to 
differed tissue types, disease states* re^oose to drugs, or is 
response to cDviroameotal faaori. As example of this 
approacb is ilhisvued ia Example 2, described with respect 

By way of example aod without implying a limitatioo of 
scope, sucb a prooeduxr could be used to simultaocousiy 2C 
scxeeo many patients against all known mutations in a 
disease gene. This invention could be used in the form oC for 
example, 96 identical 0.9 CTnx2.2 cm miaiunay% fabricated 
on a single 12 cmxlS cm sheet of plastic-backed nitrocei- 
hilose wt>cre each microamy could contain, for example, 2S 
100 DNA fragments representing all known mmatioDS of a 
given gene. The region of interest from each of the DMA 
samples from *.>6 pa bents could be amplified, labeled, and 
hybridized to the 96 individual arrays wiib each assay 
performed in 100 miCToliters of bybridizatioo soluijon. Tbc x 
approximately 1 thjck silicooe rubber barrier elements 
bet wee 0 individual amys prevcot dDGS-Gomamination of 
the patient samples by sealing the pores of the nitrocelh)k>se 
aod by aaing as a physical barrier between each microamy. 
The solid support cnouining ail 96 miaoamys assayed %ith ^5 
the 96 patient samples is incubated, rinsed, detected and 
analyzed as a single sheet of CDaterial using standard 
radioactive, Uu orescent, or oolorimetiic detection means 
(Maniatas, et al^ 1 989)^ Previously, sucb a procedure would 
invdlve the handling, pnicc^iing and tracking iir96 xparaie «> 
membranes in 96 separate scaled chambers. Oy processing 
all 96 arrays as a single sheet of material, signihcaot time 
aod COS! savings are possible. 

The assay format can be reversed where the patient or 
organism's DNA is immobilized as the array elements and 45 
each amy is hybridized with a different mutated allele or 
geoetjc marker. The grtdded solid support can also be used 
for parallel non-DNA ELISA assays. Furthermore, the 
inveniioD allows for the use of aU standard detection tneth- 
ods without the need to remove the shallow barrier elemcnu 
to carry out tbe detection step. 

in addition to the genetic applications listed above, amys 
of wbole cells, peptides , enzymes, antibodies, antigens, 
receptors, ligaods, phospholipids, polymers, drug oogcoer 
preparatioiks or ctaemical stibwtapfTs can be fabricated by the 
means descr^d in this invention for Urge scale scree mng 
a.sMyK in medical diagmtoici, drug dtxuivery, mitlcujlar 
biology, immunology asd toxicology. 

Tbe mulii<eU subsuvte aspect of the invention allows for 
the rapid aod convenient screening of many DNA probes 
against many ordered arrays of DNA fragments. This elimi- 
nates the need to handle and detect many individual amys 
for performing mass screenings for genetic research and 
diagnostic applications. NuxxKrous microarrays can be fab- 
ricated on the same solid support and each microamy cd 
reacted with a different DNA probe wtiile tbe solid support 
is processed as a single sheet of material. 
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Jhe following examples ilhmnie. but in no way art 
mtended to limii, the present iox-entjon. 

KXAMi^U: 1 

Cenomic-Complcxity Hybridization to DNA 
Microamys Representing the Yeast Saechartmyca 
ormuLtfr Genome wnb IWo-Color Fluorescent 
Detection 

Tbe amy elements were randomly ampHtied PCR 
(BohUodcr. el aL, 1992) products using pbysicallv mapped 
Usbdi duns of 5. cm^-iMUir geswrnif DNA ji^'ienipUic:. 
(Riles, et aL, 1993). Tbc PGR was performed direalv oo the 
lambda phage lysaies, resuliiag m an amplihcDtion'of both 
the 35 kb Umbdi vector and the 5-15 kb yeast insert 
sequences in tbe form of a uniform distrimiioD of PGR 
product between 25tK1500 base pairs in length. Ihe PCR 
product m,-as purified using Sephadex G50 gel filiraiioo 
(Phanrnda, Pi«auwiy, N J.) «nd cnncenuated \yy cvayvv 
ration to dryucss at room tcrnpcraiurc ovcmighL Each of tbc 
864 amplified lambda dooes was rebydraied m 15 ^ of 
3xSSC in prtparaboo for spotting onto tbc gias. 

Tbe miaoamys were fabhcaied on microscope slides 
which were cnaied with « liycr of poiy.UlyMnc (Sigm*). TIk 
amocnaied apparatus described m Section III loaded I ^ of 
the concentrated lambda clone PCR produa in 3jcSSC 
directly from 96 well kiorige pbtc» imo the open capillary 
printing clement and deposited -5 nl of sample per slide at 
380 micToo spacing between spots, oo each of 40 slides. Tbe 
pnxess was repealed for all 864 samples and 8 control ^is. 
After the spa ling open lion W4s complete, the slides were 
rehydrmtcd in a bumid chamber for 2 hours, baked ia a dry 
8Cr vacuum oveo for 2 hours, nosed to remove unabiorbcd 
DNA and then ueaicd with ^ccinic anhydride m reduce 
Qon-spccific adaorpuon of tbc labeled hybridixaiioo probe to 
the poly-J-lysine coated glassi surface. Immediately prior to 
use, the immobiloed DNA on the array was denatured in 
distilled waur at 90* for 2 minutes. 

For tbc pooled chromosome experiment, tbe 16 chromo- 
socnes of Saceharomyccs cerc\%sxac were separated in a 
CHEF agarose gel ippariius (Biorad, Richmond, Calif.) 
Tbe six largest chromooomcs were isolated in ooe gel slice 
and the ten smallest chromosomes ui a second gel slice. Tbe 
DNA was recD\-cred using a gel extraction kit (Qiagco, 
Chauwunh, Cjilif.) Tbc iwo i±rumosome puuls were ran- 
domly amplified 10 a maoocr suniiar to that used for tbc 
urget lambda cloocs. Following amplification, 5 micro* 
grams of each of the amplified chromosome pools were 
separately raj)doro-pnmer labeled using Klcnow polyincrasc 
50 (Amersbam, Arlujgioo Hcigbis, 111.) with a lissamioc con- 
jugated nucieoudc »D»k>g (Duponi NEN, Bowoo, Mass.) for 
the pool cnniaining the x\j largest chmrTKvv>me.v and with a 
tiuorcsoeio coojugatcd nucleotide analog (BMB) for the 
pool conuiaing ico smallest chromosomes. Tbe rwo pools 
55 were mixed aod cooccoiraied using an uhrafidirauon device 
(Amicon, Dan vers. Mass.). 

Frvx micrograms of the hybridization probe coosisiing of 
both chromosoroc pools in 7.5 /«i of TE was denanired m a 
boibng water bath and then soap cooled oo ice. 2.5 /J of 
coocenuaied bybriduaiioo solution (5xSSC and 0.1 Sfc SDS) 
was added and all lU /d Lraosf erred to the anay surface, 
covered with a cover slip, placed in a custom-built single- 
slide humidity chjmlxr and incubated at 60* for 12 hourx. 
Tbc slides were then rinsed at room temperature in O.lxSSC 
aod 0.1% SDS for 5 minutes, cover slipped and scaooed. 

A custom built laser fluorescent scanner wu used to 
detect the two-color bybridizatioo signals from the 1^x1.8 
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cm amy at 2U micron resoluiioo. Jbe tcaooed im&gc wis 
gnddtd lod laalyzed using custom image aoilysis sofivaR. 
Ahcr corrcaifig for optical ciossuDc beiweeo tbc fiuoro- 
pborcs due to tbcir ovcrlappxcg cmmioo spectra, tbc red tod 
greeo bybhdtzatioo vahies for each done oo the aira7 wert 
corrtbterl to the pb>3ici] map pouiioo of tbc dose 

resultiag 10 i compuier-geDcraied color karyotype of the 
yeisi geoome. 

FIC. 6 shoe's tbc bybridizatioo patum of tbc two chrt>- 
mosome pools. A rzd signal mdicaies thai tbe lambda ck)oe 
OD tbc array surface cosiaizu a dootd geoomic DNA seg* 
meat from 00c of tbe six largest yeast chromosomes. A green 
signal indicates tbal tbe lambda clone insert comes from one 
of tbe ten smallest yeasi chromosomes. Orange signals 
indicate repetitive sequences wbicb cross hybridized to bocb 
chromosome pools. Cootrol spots on tbe array con&rm that 
the bybhdxzatioo is sped£c and rcproducibk. 

Tbe physscaJ map locations of the genomic DNA £ng- 
ments contained in each of the clones used as array clemenu 
have been prrvinuxly determined hy Olvin and crvwnrWcn 
(Riles, ct aL), aUowiog £nr the automatic generation of the 
color karyotype shown in FIG. 7. Tbe color of a cbromo- 
somal section 00 the karyorype corresponds to tbe color of 
the array clement coouining tbe dooe from tha section. Tbe 
black regions of tbe karyotype rcpresem false negative dark 
spots on tbe anay (10%) or regiom of tbe geoome not 
covered by the Olson clone b'brary (90%). Note thai tbc six 
largest chromosomes are mainly red wfaiie tbe teo amallot 
chromotomes are mainly green, thus matching tbc origioal 
CTiBF gel i^latiun uf Ihc hybridLuiius prubc. Areas uf the 
red cfaromcaoffics conuming green spots and \Soe -versa are 
probably due 10 spurious sample tracking errors in the 
formation of tbe ohgmaJ l&rary and in ibe amplification and 
spotting procedures. 

Tbe yeasi geoome arrays have also bees probed wiih 
individuaJ cbnes or pools of dooes that arc fiuorcsccntly 
labeled for physical mapping purposes. The hybridization 
signals of these clones to the array were traoslaied into 
pr»\iiinr» on the physical map of ihe yea.V genome. 

EXAMPIX 2 

Total cDNA Hybridized to Miao Arrays of cDNA 
Qooes with Two-Color nuortsceni Deteciion 

Twcoty-four clones containing cDNA inserts from the 
plant Arabidopsis were amplified using PCK. Salt was added 
to tbe purified PGR produos to a final concentration of 
3X.SSC The cDNA clon« were <^ned on poiy-Hy%inc 
coated microscope slides in a manner similar to Example 1 . 
Aizx>ng tbe cDNA dooes was a clone reprcscoting a tran- 
scription factor HAJ4. which bad previously been used 10 
create a transgeaic line of tbe plant Arabidopsis. in which 
this gene is present at ten times the level fouod in wild-type 
Arabidopsis (Sd>ena. et aU 1992). 

Total poly-A mRNA from wOd type Arabidopsis wis 
isiilaicd using KtandanJ mcihiids (Maniattv et aL, 1989) and 
reverse iranscTft>ed into total cDNA, using a fluorescein 
oudeotide analog to label the cDNA product (green 
fluorescence). A similar procedure wu performed wiib the 
transgenic line of Arabidopsis wbcre the transcription factor 
HAJ'4 was inserted into the genome using uandartJ gene 
transfer protocols. cDNA copies of mRNA from the trai»- 
genic plant arc labeled wiib a Itssaminc nudeotide analog 
(red fluorcsccocc). Two micrograms of tbc cDNA products 
from each type of plant were pooled together and hybridized 
to tbe cDNA done array in a 10 microliter hybridization 
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rcaown m a maimer srmilar to Example 1. Ktosinjc and 
dacctioo of hybridiiauon wis also performed in a manner 
wnflar 10 Example 1. HG. 8 shows the resulting bvbridira- 
tion patton of the array. 

Geoes equally expreued in wild type and tbe transgenic 
Arabidopsis appeared yellow due m exjual contr^utions of 
the green and red fluorescence to tbe final signaL Ibe dots 
are different mtensiiies of yellow indicating various ievds of 
gene expression. Tbe cDNA done representing tbe inn- 
RdipiKw facinr HAT4. ciprtawtd in the tr^wgcnic Unc of 
Arabidopsis but not deiccubly expreascd in wild type 
Arabidopsis, appears as a ruS dot (wiih the arrow pointmg to 
itX mthuiimg tbe prrfcrcmial cxprowo uf the triBscriruoo 
faoor ID tbe red-labeled transgenic Arabidopsis and the 
relaovc Udc of expression of tbe transcription factor io tbe 
greeo-Ubeied wild type Arabidopsis. 

An adx-anuge of the micmarray hyhridijaiinn format for 
gene arpreaswD studies is the high partial cooccmraiioo of 
each cDNA species aciuevable in tbe 10 microbter hybrid- 
izauon reaaioD. This high paniaJ coDccntraboo allows for 
detection of rare transcripts wiiboui the need for PCR 
amplificaijoD of the byt>ndaauon probe which may bias tbe 
true genetic rcprcscnuiioo of each discrete cDNAspeacs. 

Gene erprcviicm muJicn suiii a^ these can he iLsa} for 
geoomx3 reaearcb to discover which genes are expressed in 
which ccU types, disease suies, development suies or 
cnvirooroenul conditions. Geoc expression snidic* can also 
be used for diagoosis of disease by cmpiricaUy cotTtUtiag 
geoe expression pauercs to disease stales. 

EXAMPLE 3 

Multiplexed Colorimeuic Hybridiraiioo on a 
Gridded Sobd Support 

A sheet of plastic -backed mtroccUuJose was gridded with 
barrier clemenu made from silicone rubber accorriing 10 tbt 
deaa-ipiioD id Scawn IV-A. Tbc sheet was soaked io 
lOxSSC and allowed to dry. As shown in FIG. 12, 192 M13 
clones* each with a diffcreoi yeast inserts were arrayed 400 
micnmx apart m four quadrants of the vilid supjiiri uxing the 
automated device descnbcd in Scciioo 01. Tbe bottom left 
quadnnt aerved as 1 ocgaove control for hybridixauon, 
while each of the other ihrrx quadrants was bvbrirfixcri 
lUDuhaocously wuh a diffcrcni oiigonudeotide using tbe 
open-lace byhndixjuon icchooiogy described m Scchoo 
TV- A. Tbc tn\ rao and last four clcmcms of each array art 
pCBJi/vc cooi/ols for ihc caiotimcixic dctcaioo step. 

Tbc oiigoDuclcoiidcs were labeled *-iib fluorescein, 
whjcb was deiecied using an anti-fluorcscem antibody con- 
jugated to aUalinc pbc3spbaiasc that predpiiated an KBT/ 
Bap dye OD tbc solid support (Amersham). Pcrfca matches 
between tbc labeled ohgos and the M13 clones resulted in 
dark spots visible to toe naked eye and deicacd using an 
optical icanncr (HP ScanJet If) atucbed to a personal 
computer. Tbc bybriditauon paiiems are different in ev«ry 
quadrant iodicaimg that each oligo found several uoique 
Ml? dunes fnjm nnxjng the 192 wiih a perfect scc^ueoLX 
match. Note that tbe open capillary printing tip leaves 
deiccublc dimples 00 the nitrocellulose wbicb can be used 
to automatically align and analyze tbe images. 

Although tbc ixrvcniioo has been described with re.xpeci to 
spcciflc embodiments and methods, it will be dear that 
vinous changes and modificaboo may be nude wiihoui 
departing from tbc uvention. 
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\Meckim: 

1. A meibod of fonsisg a microamy of discreic aotlyic* 
uuv rcgioos oa a solid suppon. where udi discrete region 
ID tbc microftmy fau a aclccird, aiulytc-spcdfic rugcnu 
Slid metbod comprising 

(a) loading an aqueous sotniioo of a "^ ^yn f tf tnalyic* 
wptd&e rcageni in a rca^eni^ii^c&sinc device having 
an ekmgatc capillary channel adapted to hnid a quamiiy. 
of tbc reagent solution and having a tip region ai which 
the solution in the channel forms a w*«^^ms, 

(b) tappmg the lip of the dispcssing device agaisft a solid 
suppon at a defined position on the surface, wixb an 
impubic cilccnive lu break the meniacn in ibe opOUry 
rfaannrl and deposit a aekaed vohime between OXJOl 
and 2 nl of sohition oo the surface, and 

(c) repeating steps (a) and (b) until said micsoairay is 
formed. 

2. The method of claim 1, wherein the reagems used to 
form tbc discrete regions in the midoamy are distinct 
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tuickic add struds and wherein step* (») tnd (b) art 
repealed umiJ the miowmy ha* about JOOor more discrete 
regions of disuna nuddc add strands per cm* of solid 
suppofL 

3. The metbod of claim I, wherein the reagents used to 
form the discreu repons in the micioaxTay are distioa 
nucleic acid «nnd!v and m-herein juepi (t) ^nd 0>) -re 
repeated until the mjcroarray has about 1000 or more 
discrete regions of distinct nucleic add sinnds per cm* of 
solid stTpon. 

4. The nMthod of claim 2, wherein the channel is opeo- 
sidcd. ^ 

5. The method of claim 3, wherein the channel is open- 
aided. 

<. The method of claim 4, wherein ihe volume is berwcta 
OJOai and 0^ nl 

7. The method of claim 5. wherein tbc volume is between 
0.002 and 0^ nl 
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Tne deveiopment and progression of cancer'"^ and 
the experimental reversal of tumorigeniclty^*^ are 
accompanied by complex changes in patterns of 
gene expression. Microarrays of cDNA provide a 
powerful tool for studying these complex phenom- 
ena^®. The tumorigenic properties of a human 
melanoma cell line. UACC-903, can be suppressed 
by introduction of a normal human chromosome 6. 
resulting in a reduction of grovyrth rate, restoration 
of contact inhibition, and suppression of both soft 
agar clonogenicity and tumorigenicity in nude 
mice^^ ^- We used a high density microarray of 
1,161 DNA elements to search for differences in 
gene expression associated with tumour suppres- 
sion in this system. Fluorescent probes for 
hybridization were derived from two sources of cel- 
lular mRNA [UACC-903 and UACC-903(+6)l which 
were labelled with different fluors to provide a direct 
and internally controlled comparison of the mRNA 
levels corresponding to each arrayed, gene. The flu- 
orescence signals representing hybridization to 
each an-ayed gene were analysed to determine the 
relative abundance in the two samples of mRNAs 
corresponding to each gene. Previously unrecog- 
nized alterations in the expression of specific genes 
provide leads for further investigation of the genet- 
ic basis of the tumorigenic phenotype of these cells. 

DNA microarrays, comaining \^{^\ total elements, 
inciiitlini: S7() dilTcrent cDNAsaiui controls'^*" (sec 
Methods), were printed rol>i>tiv:ally onto a i:lass micro- 
scope slide in tour qiiadraniN eoverini: an area of about 
I cnr (Fig. I ). VVc prepared lluorescent cDNA probes 
usini; total poly (A)* mKNA from UACC-y()3 cells and 
UACC-903(+6) cells by iabcllinj; with a ^ireen and red 
fluor. respectively. A mixture ol the two flourescently 
labelled probes was hybridized to the DNA microarray. 
This comparative hybridization method, coupled with 
the doping of synthetic standards and an estimation of 
sintisiically significant deviation for local background 
variance allowed a direct and quantitative comparison 
of the relative abundance of individual I')NA scciicnces 
in this complex sample'*"**. We added a .set of synthetic 
poly (A)*-iailed 'mRNAs' to the purified mUNA from 
each cell line as internal standards to assi.st in quantita- 
tion and estimation of experimental variation intro- 
duced during labelling and reading. Targets 
complementary to these standards were included, in 
duplicate, on the microarrav. Based on these standards. 
■rVN^ ,-v,-v»v rr v >, . 



spond to genes preferemiallv expressed in the tumori- 
genic UACC-903 cell line, and the reddish spots corrc- 
spond to genes preferentially expressed m the 
non-tumorigenic UACC-903H-6) cell line. Genes 
expressed at approximately equal levels in the two cell 
lines appear yellow or brown. A portion of the arra\ at 
higher magnification highlights the diverse pattern of 
differential expression observed (Fig. 2b). In Fig. 2c rec- 
tangles corresponding to specific arrav elements are 
coloured to reproduce the hue and intensity of the fluo- 
resceni signal at each element. The hybridization signals 
from a duplicated set of genes are shown iu.vtaposed, to 
illustrate the reproducibility of the hvbridization signals 
for each gene. 

To address the possibility that an apparent dift'ercnce 
in expression might result from experimental variables 
unrelated to the difference in chromosomal composi- 
tion between the two ceil lines, we examined the vari- 
ance in expression for 90 'housekeeping* genes. We 
selected these genes based on the a.ssumption that they 
would not be differentially expressed between the two 
cell lines. The averaged red/green ratio for this subset of 
genes was 1.13. The averaged red/green ratio for the set 
of five internal standards was 0.97 (/; = 10). The vari- 
ability in the expression level of the housekeeping genes 
probably overestimates the experimental variability in 
me-asuring differential expression. As a conservative stan- 
dard, an absolute fluorescent signal ired or green) with 
an intensity gre.uer than that observed at the contr ! 
array elements containing total human genomic DNA 
was considered to represent specific hybridization. Gene- 
spec ific hybridization was there I ore only considered sig- 
nificantly diftvreni between .s;imples if the following tw 
criteria were met: i) the signal intensity (gnx-n or red) 
exceeiied this threshold: and ii) the logarithm of the 
red/green tlimrescence signal ratio differed by >3 S.D. 
trom the mean logarithm of thi.s ratio tor the liouse- 
keeping'gene panel (that ii, ratios <0.S2 or >2.4). 

Bv ihe>c criteria. niRNA levels for 15/870 (1.7%) genr 
were stgnilKMnily diminished, while the mRNA levc' 
tor 63/S7() {7.3*M.) genes were significantly increased 
a>M>ciation with suppression of tumorigenicity by int 
duciion of chromobt)me b. Tti test the reliabilil 
niicriiarr;iy hybridization resull.s in identifying diff^ 
tially expressed genes, we analysed U> genes by i 
ern analysis. In each case, the results of northern ; 
corroboriiled the diflereniial gene exi^ression i 
bv micro.irray hybridization (Fig. 3). 

.Significant differences in expression beiwj 
two cell lines identified several genes as cand 
deiernuning features of the tuniorigenic pi 
the melanoma celU. For example, amoijj 
delected with significantly higher expressia 
in the tumorigenic cells was the human br^ 
lein f TRP 1/nielanoma .uiiigen gp75). 
abundant glycoprotein in mclanocyiic < 
melanoNome membrane protein'-*' 
expression is reduced when melai 
induced to differentiate by trealmen 
Also expressed at a significantly high 
variant ol then'!HN^ encoding: wi 
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• (54) Hybridization Specificity Controls 

(183) Melanoma Subtracted cDNA 
C (687) Unigene / EST cDNAs 

Fig. 1 Propentes erf cDNA microarTays. «. A Huorescent scan of DNA pnmed onto a poiy-tysme coated sbde. The DNA is statneO wttn a DNA-specHic ttuorescent 
dye. YOYO. The centef-io-ccnier spacing ot adjacent spots is 450 p. allowing the potential lor up to 10.000 spots/2.54 X 7.62 cm microscope slide. 5. Efit- 
cient blocking ot hybridization to DNA repeats. Hybridization of fluorcscein-iabelied poly (dT)* to arrays m the absence of competitor produces strong 
hybridization to immobilized poly IdA)* as well as to some cDNAs. such as the EST T64827 shown. Rhoaamine-tabe*ted cDNA (red) from the UACC-903 ceO 
line hybndized m the presence of poty (dA)* Woctter shows Wtie it any signal at either site (Total H = loiat human). Simiiarty, hytxidizatton with iluoftscem-iabetted 
Cotl DNA in the absence ot competitor produces bright signal on imrT%obilized Coti DNA. total human DNA and at some cDNA elements (presunr>ed to con- 
tain highly repeated sequences, such as R23416): while Rhooamine-tabeited cDNA (red) trom the UACC-903 cell iir>e produces little it any srgnai at these 
locations when hybridized in the presence ot excess untabeiied poly (dA)*. and hurr^an Cotl DNA. The absence of signal at some cDNA locations following 
UACC-903 cDNA hybndizaiions also indicates trtat the PCR-ampdlied. plasmid vector sequences at at) cDNA targets oo not comnoute sigmticant hybridiza- 
tion signal, c. Schematic of the array organisation. Robotic printing trom 96 well microliter trays was cameo out with 4 print heads, spaced to tit mto 4 ad|a- 
ceni microliter wells. This maps the contents ot each tray into tour separate quaorants on the giass snde. Acoiour-cooed map ot the general distribution of 
target types in each of the resulting Quadrants IS shown. , 



cIs were ck'v;iici1 hy ihc ailiiilion ol .) ni)rmiil chri>tiio- 
.\omc 6(1* t:cncs) arc known lo Ik ;iciivatal by IFN-y. a 
cardinal prtiintlammaiory cytokine lhai. aniotti; other 
aciiviiic>, inihiccs expression t)f the i:cnc proilitci.^ ot ihtr 
MHC cla.ss It locus. For e.vainple, the mUNA encodini: 
monocyte chemoiactic protein 1 (MCAF/MCFI ). a 
cytokine thai inJiices monocyte chemoia.vi.^'anil activa- 
tion'"'"', was more than 10- fold less abiindani in ilic 
liimoriuenic cell line. In the skin. MFC I ii criiic.il in the 
re*i:iilaiion ot cutaneous monocyte iraftickmi:"""''. and 
elevated e.vpression plays a role in suppression of tumour 
i;rowih and metastasis'**"-'. The mechanism by which 
these iniert*rron-y rci:ulaicd yencs are induced in UACC- 
903 cells by transfer of a normal chromosome 6 remains 
lobe determined. It is worth notini;. however, that the 
iniert'eron-y receptor gene is localized to the distal loni; 
arm of human chromosome 6. 

Finally, several ^cnes that showed > 10- told higher 
expression in the suppressed UACC-903(+6) cells have 
previously been rccovinized in other models of tumour 
suppression. Most notably, there was elevated e.xpre.s- 
sion of the mRNA encoding WAFl (p2l )» a key media- 
tor of tumour suppression by p53 ( rcf. 18). The p21 
protein had previously been identified as a melanoma 
differentiation-associated antigen (termed mda-6)'*'*-°. 
In melanoma cell lines suppressed for metastasis by the 
introduction ot chromosome 6, expression of \VAF1 
( 132 II mKNA and nroiein correlates inversely with 



These results provide a wide view of tlie diverse sys- 
tems that arc altered in this model system of lumori- 
ueniciiy, and tocus nttcniion on specific ycne products 
and pathways that may be of paniculnr importance in 
this tumour type. 

Our ability to classify human cancers in n way that 
reflects ihc underlying molecular p.uhology or that 
anticipalci their potential for progression or response 
loircniment. remains primitive. Using cDNA microar- 
ravs 10 define alterations in gene e.xpression associated 
with a specific cancer may be an eftlcieni way to uncov- 
er clues to the specific molecular derangements that con- 
tribute to its pathogenesis and thus ideniif>' potential 
targets tor therapeutic intervention. Moreover, recogni- 
tion of pathognomonic alterations in gene expression 
might provide n basis for improved diagnosis and mol- 
ecular classification of cancers and thus allow selection of 
liic most appropriate therapeutic strategics. 

Public databases of human expressed gene sequences 
contain partial sequences of at least 40.000 different 
human genes", and efforts to develop a human tran- 
script map have developed rapidly-'. Based on the high 
yield of information obtained using an array of < I »000 
different genes, a more comprehensive survey of gene 
expression patterns, using a more complete array of 
human genes, will likely provide a rich source of new 
and useful insights into human biology and a deeper 
understandini! of the tene pathways involved in the 
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fro. UACC-903 an. Cv3..t>e.^ cDNA .orange-red) trom V^CC-IU^^^^^^^ ^0.,at>e.ed cDNA (g,een, 

orescent proDe were comD.neO as me aoproonate colour cnannets ,n a srngie ,mage ArrowT?na?ca^re^,^^^^^^^ corresponding to each fK>. 

penes analysed Dy northern p.oTt.ng (F.g. 3). b. A magn.hed .mage ot tr.e area ot the array bo?e<^^,n wh'e?n a? ''^^ correspondtng 

f.ed by arrows -n ,a). representing tne cDNAs tor: (eti. MCAF/MCP^ i (r/o rat.o > 1 0) cenfre Ta^m ,rTn n \ n!^ '"^^^e o! three cDNAs .ndenti- 

0.2, [see F.c. 3]. d. s.mp..l>ed representation of rate hypr,d>za..>n results. 0^nt;a,°eruo^^en^^^^^^^ «^«> 
age target coiour ratio determines the hue ol each Dox and the averaoe int^n<:,tw riAr*rr« o!^^" I extracted from each array target. The avef - 

corresoonos ,0 the-r or.g.na. order ,n the rh.crot.ter plate ! ^.T' '''' '^''^ 

as .n the first two rows shown nere. to assess reprpducb.hty of tne hyPnd.zanon^s^tls^^^^^^^^ \ ^ """'^ ^^^^'"'"^ ^'<*«- 

resoonomg to genes analysed Dy nonhern piptirng ,n F.g. 3. ' Numpered arrows rno.caie the locat.on w.th.n the array cor- 



Methods 

Generation of microarrays. hybridization, ACJiminj;. Th^- 
prqur.nion ol vtuiai nikr.is.npL- j,uI .Mil>Nwi|iKnt m.Im.i- 

k )>rM)tii]|; nl |)\A w.tN orriv.^ mil ,n .i in.iimct Minil.M i.. ili.u 
itf.Hrilvt! JirKllv. pri-.k:.iu-il uvrr irv.ilol with 

:>o(v-| .Ivsmv ^.liiiti.in I.Sicnui to i,>rni an .ullustvv nuiI.ivi- I.t 
rnnrini:. VCH proittKlx |MirilWif In etluiiol piirilu.tii.ni, uvu- 
ff.Mispcnilal in .V\ SSC. A Lii.Mom tmili arravinc r.»h..i p„kal 
up ami tlL-piKiial miuII vhIkiik-n ( iianoliicf.sl oJ I )NA onm 
(he .siuk>. A tier print inc. I he slule.s were wavi>i'it ,ii a 0.2"... SI >.s 
MiliiiM.n. The remaininj: h.ntiul DNA uav tlenaiiuul In .m.Iv 
nuTj:ini:lhe>hile.vin V5 J,MiIlal xvater (.m 2 mm iHlliuveilin 
.1 Iv.el ivavh will) y.V:.. ethannl UNA vv.ix L'\' uosshnU il ... the 
>lKles fStraiaceiu- Siratalinker. W» ml). In prevent nnn-vp.vjiK 
prnhe hiiuhnj:. the .sliile> were hlt»eketl In i inMn-- in snUui.Mi 
Ml 7(1 niM .Mueinie anheilriile ifiNvnlvvil in O.t M hufK .Mif pH 
K.O. ioniaininy: .V^i,. ! -methvl- J.p r rolij.nnne lAMruh). 
AtKhtinnal pr.n.KoK aiul pari> Um . .rta.ninv: (•> niiernarrav 
:.;hr,calMm can he ohiaineJ Ironi nt ::v//on^:m.Ntanf,»rii.e.h./ 
■'-own. 

i'lirilieil. lahelleii eDNA wa.s rc.Mivpeiuleif in 1 1 pi nf .VSx SSC 
Lonia.nini: 4 m:^ ..1 p.»ie ( JA ) ' i )NA. 15 cv// 1 KNA. 4 mi; nt 

human t!.)! I I )NA (Uihe.. UUIj. aiul il\ |il ol Itr.. SI )\ Vrulr i.. 
livhriJiAMion. Ihe .v..hiti..n w.,s huilal K.. : min then all.mcJ i.. 
e.M.I I., uumx le.np.raiure. I IvIm i,h/.„i,Mi .ua^ varnul ntit a( 



In \. /n. A separate ^ean. i.s.n- the appropriate ewttalior. hnc. 
^^av Jniu- ,aji ol the IX... Ih.orophofus uveil. Data was coI- 

ie.(eJ ..I .1 in.iMMM.nijvM.h n oi y mien.ns/pt.\el with P bits 

nt JcfMh 

IVohc preparation a.ul lahelhnn. K\A u..s eMiaeieJ Irom ccllj 
nMn;j iheT. ia/Ml reaijent 1 1 1 1 inc.). fnlloxv.nc the maniiUciUf. 
ei s thieet.uMv , | )\a pmlvs tv^re n\ nihesi/eil Iro.n sinjily oii(EO 
vll-sd,vteJ (I'haiiua.LU inK\A pools. Mi.ore>ecn(lv i.xMM 
el )NA vv.,. prepareil Iroin niUNA hv ohp. ti l -pruned |>i>lvnm. 
i/ao..ii i.M),.^ Snper.Seript II leverse ti anseripta.Ne IITI Inc.). 
I he pool ol nuele.>tules m the laheliinj: reaeiion wa.s 0.5 mM 
^n.Tl'. vi.AII" .,nJ Jc:TP .„uI ().: ,„m JXTr. nuorricem 
lU.Jeoiuie.s. Ithi.J.unine I In dUTi' (IVrLin rinier Cctiis) or 
C:\.U1L-| !MAn»er>ham). we.e present at IJ. I niM. |*rolKi were 
purrfieil h\ uet chroin.no.jtaphv (IhfiSpin h/lhtiUjil) and 
eih.im.l pieeipnatmn. 

Selection of cDNA elements and generation of control tern- 
plates, .sniihelu el)\A> nere prepared hy cloning random 
/i.n//HI aiuf H/mllll ended traument.v i.f ^i. col, |)NA in the vec- 
nu pSPoJ poU (AC (rr.unecj). UuxrATutu^ i.^*dJt^.^i plaimid 
DNA u.ih h.iom aiul >vniheM/in- poly ( A|* tailed KNA com- 
pienuMiarx i.i the iiiseii hom ihe lestdenl SVh pronidier 
n*H.MH-.il. Piii.i liMiNC. tlie ^Milhesi/ed KN'A. w.-r.. ..a.....^j .... 
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Waf'1 /p21 



MAHCKS 



Ftg. 3 Uonr>ern nyOrMSaation Suostaniiai- 
tng the consisiency of ir>e cDNA miaoar- 
ray resutts. CorrcsoorxSmg locjuons withtfi 
the cDNA microarray illustrated tr\ Ptg. 2j 
are provoed lor ^)Wa^'Vp21:2)MARCKS: 
3) coltagewe: 4) MCAF/MCP-I: 5) 
anttcnymotrypsjn: artd 6) ^ -actin. The s»g- 
na) detected by a radio-taoeited ^acttn 
probe represents a control tor loading van- 
ance. with a red/green ratio observed on 
the cDNA microarray (Fig. 2a.c) for fS-actm 
of 1.04. 




Coltagenase 



MCAf/MCP'l 



to the L'niCcnc EST chijsicrini; sys- 
it'iiv'— . The second l.ir^icil yroup of 
iloiu-s consisial of \Hy wqucnccJ cl>NA 
cU>nrv i:i-iuTJU'i1 hv Mihiraciion »tf vl>NA 
Ironi I hi' vhrnm»>Minic-6 siippro.xcJ 
non-ltimorii:cniL UACC*V03 (■♦'6) cell 
line with cDNA from iisp3rcni;il liiniori- 
i:i'nic cell line UACC-Vd^ (rel. 9). 
Approxiniaiely 1110 aJdiiional •:enc-\ 
(itiliil K7(> jieneN ;irr;iveill were nhtiiineit 
Ironi KS r Itlu.irie.s i»n (he \\isiy of thi'ir 
a- 1 •Antichymotrypsin e\pre>Mon pjiiem I ii>>ue spec ilic. ;uul so 
on). l^;ich .irr.iy tncUuleJ the lollowin^ 
hyhriili/jiion citnirols: pbsmiil veeior, 
lanibib.eX 17^ ph.iue. itiul hiinun 1)NA. 
hum.in t!oil 1)NA. .iiul |>olv (A J'. The 
syniheiic >l.inil;irils ii.wi! lor norm.ili/.t- 
lion 111 Mj;n;il.s in e.uh w.ivelenufh were 
al.vi .irr;iyeil. ( j»niri»l.\ were incluileJ in 
each t]ii.uir.int ot llu* array lo avve>\ ilie reprntlucihilitv ol tlie 
hyhriili/jiinn Mcnal. IWo plate> ol cDNA clttne> (»lerive»l from 
I he UAi:t;-*>f».^ viihlracietl lihrarv) were alvi arrayeil in ilupli- 
caie. 1 iili'liiy ot the L'nijrene array relaiive lo tlhK.^l' was teNUil 
by .vet)iienctn^ ol a r a niton i .^.i in pie ol II cli>neN iiNeJ lor 
microarray con>iriiciion. All \eiiiKMKe> wire iileniical wiih ihe 



iP-Actin 



Corroj^milmv: ilhl.ST enineN. Aailiiumallv. cj^n mu t.ui ^ ,f 
cDNA Ironi ihe LACC-VtO Mihlracieil lihrarv wai ni|„i n.i J \ 
listing ol cI>NA.^ coniprisinu ihis niicnwrrav v»inji \%eu 
ilaiveJ from ihc Unigene jnj honwUepmi:' pan. I wan iv 
obiaincii from hlip;/ymx^v.nih.irt>v/|)IK/LCC/AKKA^ expn > 
himL 

Northern bli>l analysis, Ti»ial RNA, 10 |vr Ijne. u j^ eliv- ' 
irophon-M-il in I.2VS. jjraro.v-lormaldchvile i:els jiut transfer re J 
onto nylon nu-mhranc ( HilvinJ-K*. .Amersham) hy capillar i 
blotiini: inern!j;hi. For DN'A prolx's iiwri fravrments imm ihc 
Snares IMIl cDNA library'" were obunicJ bv vecior I»CK for 
p2l, MAUCikS. a- I-aniichyni(MrypMn anj 0-aciin. Pmhes lt»r 
nbrobljNi collacena.H* and MCAWMliH-l wvrc iMtlated troni a 
UACC-VI»3(+tW enriched cDNA lihrarv" with all prolw 
labelled by random prtmint;. t-ilierN wen* wa>hcd lo a %iiin- 
gency ofo. Ix SSC at ^: "C lur 20 mm. 

Web sites, hi tp://cmum. Man roril.i diiypbnm n fur protiKols and 
parts list pertainini: lo micioarrav labruahon. 
http://www.nciii:r.nih.j:nv/l )IK/l.(:t'./AKKAV/expn.html lor a 
listini: t»l cl)NA> compriMiig ihi> niicroarrav which weie 
derived Imm (be Unij:enc ami 'hinivekeepim;' paiui. 
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GENOME METHODS ''JpCE: This mat-ns' n=y fc« ^-.j, 

A DNA Microarray System for Analyzing 
Complex DNA Samples Using Two-color 
Fluorescent Probe Hybridization 

Dari Shalon/-^ Stephen j. Smith,^ and Patrick O. Brown^-^-^ 

^Howard Hughes Medical Institute and Departments of ^Biochemistry and ^Molecular and Cellular 
Physiology, Stanford University, Stanford, California 94305 



Deteaing and determining the relative abundance of diverse individual sequences in complex DNA samples is 
a recurring experimental challenge in analyzing genomes. We describe a general experimental approach to 
this problem, using microscopic arrays of DNA fragments on glass substrates for differential hybridization 
analysis of fluorescently labeled DNA samples. To test the system, 864 physically mapped X clones of yeast 
genomic DNA, together representing >75% of the yeast genome, were arranged into lJB<m x 1.8<m arrays, 
each containing a total of 1744 elements. The microarrays were charaaerized by simultaneous hybridizarion 
of two different sets of isolated yeast chromosomes labeled with two different fluofophores. A User 
fluorescent scanner was used to detea the hybridizarion signals from the nvo fluorophores. The results 
demonstrate the urility of DNA microarrays in the analysis of complex DNA samples. This system should 
find numerous applications in genome-wide generic mapping, physical mapping, and gene expression studies. 



Many problems in genome analysis dep>end on 
determining what specific sequences are repre- 
sented in a complex DNA or RNA sample and at 
what abundance, for example, what genes are 
represented in a specific chromosome band or 
VAC clone, what inter\*als are amplified or de- 
leted in a particular cancer cell or what genes are 
expressed in specific cells under specific condi- 
tions. As a general approach to this problem, we 
have developed a system for making miCToanays 
of DNA samples on glass substrates, probing 
them by hybridization with complex fluorescent- 
labeled probes, and using a laser-scanning micro- 
scope to detect the fluorescent signals represent- 
ing hybridization. Fluorescent labeling allows for 
simultaneous hybridization and separate detec- 
tion of the hybridizarion signal from two or more 
probes. This in turn allows verj* accurate and re- 
liable measurement of the relarive abundance of 
specific sequences in two complex samples. 

RESULTS 

Array Hybridizarion Panern 

Figure 1 shows the two-color fluorescent scan of 
a yeast genomic array following hybridization 

'^Froent »ddr«it: Syntml, Inc.* Falo Alto, C*tlfomU 94 SOS. 
*CofTc»pondln9 Author. 

t-MAIL pbrown Cfn9m.ttanf0rd.edu, httpr/Zctngm. 
ttanford.tdu/pbrown; FAX (4 IS) 723.1199. 



with a mixed probe consisting of lissamine- 
labeled DNA from the 6 largest yeast chromo- 
somes together v^^th fiuorescein-labeled DNA 
from the 10 smallest yeast chromosomes. A red 
color indicates that yeast sequences present in 
the lissamine-labeled hybridization probe hy- 
bridized to an array element. A yellow-green 
color indicates that yeast sequences present in 
the fluorescein-labeled hybridization probe hy- 
bridized to an array element. An orange color in- 
dicates cross-hyoridizarion of both chromosome 
pools to an array element (e.g., dispersed repeti- 
rive elements, such as Tyl elements). 

Each clone was spotted rv\uce, resulting in du- 
plicate hybridizarion patterns in adjacent quad- 
rants of the array. Control DNA spots, which 
were randomly amplified in the same manner as 
the X clone array elements, are located in the bot- 
tom corner of each quadrant, "A" points to a pair 
of spots containing total yeast genomic DNA. 
These spots appear orange because both chromo- 
some pools hybridized to yeast genomic DNA. 
The negarive controls are as follow^s: "B" points 
to a pair of spots of wild-t>'pe X DNA, "C" fX)int$ 
to a pair of human genomic DNA spots, and "D" 
points to a pair of 6X174 DNA spots. The lack of 
a hybridization signal at these three negative 
control spots indicates that the hybridization was 
specific for yeast sequences. 



SHALON n AL 




Figure 1 Two-color fluorescent scan of a 1 .8-cm x 1 .8-cm yeast array 
of X clones of yeast genomic DNA. The DNA spots are spaced at a 
distance of 380 from center to center. A probe mixture consisting of 
DNA from the 6 largest yeast chromosomes (4, 7, 1 2, 1 3, 1 5, 1 6) labeled 
with iissamine (red dots) and DNA from the 10 smallest yeast chromo- 
somes (1, 2, 3, 5, 6, 8, 9, 10, 11, 14) labeled with fluorescein (yellow- 
green dots) was hybridized to the array. A pair of yeast genomic DNA 
spots (A) served as a positive control. The three negative controls are X 
DNA (5), human genomic DNA (C^, and 6X1 74 DNA (D). 



Karyotype Depiaion of the Array Hybridization 
Panern 

The inserts contained in the arrayed X clones 
have been mapped physically (Riles et al. 1993). 
The clones are arrayed in a random but known 
order on the array. Therefore, using the identir\* 
of each clone along with its physical map infor- 
mation, the partem of hybridization to the yeast 
array can be represented in the form of a karyo- 
type of the yeast genome, as shown in Figure 2. 
The color of any segment of the ideogram repre- 
senting an individual chromosome on the kar\'o- 
type is direaly determined by the ratio of. red and 
green hybridization signals at the array positions 
of the corresponding clones. The lengths of the 
discrete colored segments of each chromosome 



inserts. The chromosome seg- 
ments colored black represent ei- 
ther interx-als of the genome that 
are not represented by clones in 
the library (90%i or false-negative 
hybridization signals on the array 
(10%i. Most of these false nega- 
tives are attributable to failures of 
the PCR amplification of the x 
clones, though occasional failures 
of the arraying process or nonuni- 
form surface preparation could ac- 
count for a small fraaion of the 
false-negative signals. The large 
gap on chromosome 12 is the re- 
gion coding for ribosomal DNA 
that was not represented among 
the arrayed clones. Genomic inter- 
vals represented, by overlapping 
clones were assigned a color based 
on the hybridization signals of 
only one of the overlapping 
clones, chosen at random. 

Note that in this representa- 
tion of a yeast karyotype, the larg- 
est six chromosomes are mainly 
colored red. This indicates that 
most of the arrayed clones that 
were mapped previously to these 
si.x large chromosomes hybridized 
primarily to the lissamine-labeled 
probe prepared from the corre- 
sponding purified chromosomes. 
Conversely, the smallest 10 chro- 
mosomes are mainly colored green 
in this image, matching the origi- 
nal CHEF gel isolation of the chro- 
mosomes used as the hybridization probe. The 
experiment was repeated with the yeast genome 
split into six discrete chromosome pools contain- 
ing 2-4 chromosomes per pool using CHEF. gel 
electrophoresis. The chromosomes in each pool 
were extracted from the gel, amplified, and fluo- 
rescently labeled. The six chromosome pools 
were hybridized to six separate yeast arrays. 
Forrv-four X clones gave a positive hybridization 
signal on all six arrays indicating that they con- 
tain yeast repetitive sequences (data not shown). 
These 44 clones and 10 clones with very weak 
hybridization signals were not included in the 
data set used to produce this karyotype. 

There were ^40 anomalous clones, which ap- 
pear in this karyotype representation as green 
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Figure 2 Computer-generated ideogram repre- 
senting a karyotype of S. cerevisiae, based on the 
normalized hybridization signals from the array 
shown in Fig. 1. Note that the 6 largest chromo- 
somes are mainly red and the 10 smallest chromo- 
somes are mainly green. Black stripes represent in- 
tervals not represented by clones in the array or for 
which the corresponding clones gave false-negative 
hybridization signals. 

bands on the otherwise green chromosomes. 
Four randomly chosen examples of these anoma- 
lous clones were analyzed by hybridizing the 
clones to vertical strips cut from a Southern blot 
of CHEF gel-separated yeast chromosomes. In 
each case, the hybridization patterns of the 
anomalous clones corroborated the chromo- 
somal locations assigned by the microarray hy- 
bridization results (data not shown). Two clones 
that were thought to map to the 10 smallest chro- 
mosomes were found to hybridize preferentially 
to the probe representing the 6 largest chromo- 
somes and thus appear as anomalous red bands 
on the karyotype. Both hybridized to one of the 
six largest chromosomes on the Southern blot. 
Similarly, two clones that appear as anomalous 
green bands on the karyot^^pe were found to hy- 
bridize to one of the 10 smallest chromosomes on 
the Southern blot. Thus, the anomalous clones 
are probably the result of sample tracking errors 
or, possibly, of enors in the published restriction- 
digest-based physical map on which the karyo- 
t>'pe representation was based (Riles et al. 1993). 

DISCUSSION 

The DNA microarray hybridization system re- 
ported here is conceptually and functionally 



similar to fluorescent in situ h\'bridization (FISH) 
to metaphase chromosomes, 'with three impor- 
tant differences. Fim, the target elements of the 
miaoarrays can, in principle, be any length or 
composition, from megabase YAC clones or mi- 
crodisseaed chromosome bands to indi\idual 
cDNA clones, to shon oligonucleotides. This ver- 
satilit}' allows the user to choose characteristics, 
such as the,mapping resolution and genetic com- 
plexity of each array element, to suit a particular 
application. Second, the hybridization signals are 
localized to discrete elements of known size and 
location, making them easier to identify and 
quantitate than the hybridization signals from 
irregularly shaped metaphase spreads. Third, mi- 
croarrays are more consistent and potentially 
amenable to automated production, hybridiza- 
tion, and data analysis than metaphase spreads. 

Arrays of DNA samples on porous mem- 
branes, for example, dot blots, have long been 
used as a basic tool in molecular biology. Dot- 
blot membranes are usually at least 8 x 12 on in 
size, require the use of milliliter volumes of hy- 
bridization solurion, and are limited, owing to 
autofluorescence and scanering, to radioactive, 
chemiluminescent, and colorimetric hybridiza- 
tion detection methods (Ross et al. 1992). Miao- 
anays made on glass surfaces, on the other hand, 
can be mass-produced and are comparatively in- 
expensive, convenient, and compatible with 
fluorescent hybridization deteaion methods. 
Furthermore, a glass surface, when appropriately 
treated, has very low nonspecific binding of la- 
beled hybridization probes, resulting in lower 
backgrounds than are encountered typically with 
porous membranes. For hybridizations with very 
complex probes, the concentration of the labeled 
probe DNA is a limiting faaor in the sensitivity 
of the assay. Minimizing the volume of the probe 
solution in a hybridization, by restricting the tar- 
get to a small area and by using a nonporous 
substrate, makes it praaicai to achieve very high 
probe concentrations. 

One imponant advantage of fluorescently la- 
beled probes is that, unlike most radioaaive and 
chemiluminescent signals, fluorescent signals do 
not disperse and therefore allow for very dense 
array spacing. A unique, and probably the most 
important, advantage of fluorescent probes is 
that the hybridization signals from two or more 
differently labeled probes hybridized to the same 
target element can be deteaed separately. In this 
way, two-color hybridization detection allows for 
a direct and quantitative comoarison ♦^^^ 
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abundance of specific sequences between two 
probe mixtures that are hybridized competitively 
to a single array. The absolute intensity of a hy- 
bridization signal at a particular element in an 
array can vary owing to experimental faaors 
such as variations in the amount of DNA depos- 
ited on the array, variations in the hybridization 
or wash conditions between experiments, or 
variations in the hybridization charaaeristics of 
the different DNA sequences on the array. The 
ratio of the two signals at any element in an ar- 
ray, however, is relatively insensitive to these 
confounding factors because they affect both 
probe mixtures equivalently. This ratio therefore 
accurately reflects the relative abundance of the 
cognate sequence in the two probe samples. This 
is the principle underlying the technique of com- 
parative genomic hybridization (CGH), which is 
used to detea changes in the copy number of 
specific chromosomes or chromosomal regions 
(Kallioniemi et al. 1992). CGH is based on mea- 
suring the relative fluorescent hybridization in- 
tensities of two genomic-complexity hybridiza- 
tion probes, for example, probes represenring ge- 
nomic DNA from normal and affeaed tissue 
samples, which are labeled with two distinct fluo- 
rophores and hybridized simultaneously to a 
metaphase spread. DNA microarray representa- 
tions of the human genome may provide a more 
convenient and higher resolution alternative to 
metaphase chromosomes for CGH. 

Cross-hybridization between related se- 
quences is an important problem faced by any 
hybridization-based assay, including the DNA 
microanay assay described here. Studies are now 
in progress to quantitate the extent of cross- 
hybridization between related sequences of vary- 
ing homology and length, in DNA microanay 
hybridizations. The stringency of hybridization 
and washing can be controlled by varying the salt 
concentration and temperature as in conven- 
tional membrane-based hybridizations. Cross- 
hybridization caused by repetitive sequences can 
be minimized by prehybridization of the probe or 
anay with vast excess of unlabeled copies of the 
repetitive sequences. 

Alternative methods have been described for 
making microarrays of very short DNA se- 
quences, invoiv:ng photolithography (Pease et 
al. 1994) or physical masking (Maskos and South- 
em 1992) methods. These in situ synthesis meth- 
ods are inherently limited to low complexity ar- 
ra;^ elements consisring of oligonucleorides. For 



hybridization is improved by using DNA frag- 
ments substantially longer than oligonucleo- 
tides. Moreover, the in situ synthesis approaches 
to array fabrication depend on prior knowledge 
of the sequence to be recognized by each array 
element. The approach described here makes mi- 
CToarrays by transferring tiny volumes of DNA 
samples from miaowell storage plates to a solid 
substrate. Thus, nucleic acids (or other mol- 
ecules) of virtually any length or any origin can 
be arrayed, and knowledge of their sequences is 
not required. 

The arrays used in these exf>eriments do not 
represent the maximal achievable density of ele- 
ments. We have found that the spacing between 
the spots can be deaeased by shrinking the con- 
tact area of the printing tip and by increasing the 
hydrophobicity of the glass surface. Microarrays 
with 100-p.m feature size have been tested suc- 
cessfully in pilot experiments (data not shown). 
Assuming the projeaed availability of the appro- 
priate physically mapped human genomic clones 
(Hudson et al. 1995), arrays at 100-M.m spacing 
would allow for 10,000 discrete intervals of the 
human genome to be represented in a l-cm^ ar- 
ray. Such an anay could be used for mapping at a 
resolurion of <0.5 Mb. Experiments are in 
progress to explore the feasibility of such arrays. 

Our initial motivation for developing these 
miCToarrays arose from the need for abundant 
and inexpensive genomic arrays for genomic 
mismatch scanning (CMS) (Nelson et al. 1993), a 
method of genetic linkage analysis based on 
identification of the regions of "identity by de- 
scent" between affected relative pairs using a 
single complex-probe hybridization to an array 
of genomic clones. Experiments using these ar- 
rays to map quantitative trait loci in yeast by 
CMS are currently in progress 0- deRisi, D. Lash- 
kari, L. Penland, L. McAllister, J. McCusker, R. 
Davis, and P.O. Brown, unpubl.). 

Microarrays of cDNA clones, prepared using 
the system described here, have been used for 
quantitative monitoring of gene expression pat- 
terns in Arabidopsis (Schena et al. 1995), S. cerevi- 
siae (D. Lashkari, J. deRisi, L Penland, P.O. 
Brown, and R. Davis, unpubl.), and human tis- 
sues 0. deRisi, M. Bittner, P. Meltzer, L. Penland, 
J. Trent, and and P.O. Brown, unpubl.). We an- 
ticipate that DNA microarrays of the kind de- 
scribed here will be useful in additional applica- 
tions for which conventional dot blots, high- 
density gridded arrays on porous membranes, or 



DNA MICROARRAYS FOR ANALY7INC COMPLEX D\A SAMPLES 



tions include comparative genomic hybridiza- 
tion fKallioniemi et al. 1992), sequencing by hy- 
bridization (Dnnanac et al. 1993), physical mapv 
ping of cloned or amplified sequences (Billings et 
al. 1991), and economical distribution of re- 
agents for integrated genetic and phN'sical map- 
ping based on a common set of arrayed clones 
(Zehetner and Lehrach 1994). 

METHODS 

Amplification of Target DNA Elements 

The array cJcmenis were prepared from phwcaUy mapped 
X clones (Riles et al. 1993). The X clones were amplihed 
using randomly primed poi>Tnerase chain reaction (PCR) 
based on published and unpublished protocols (Eohlander 
et al. 1992; S. Nelson, unpubl.). The phage lysatcs were 
amplified in a 10-^1 PGR reaction using S ^im hnal concen- 
tration of primer A (GCTATCnCAAGATCAN^rNNNX), 
200 i»M dNTPs. and 1 unit of Taq pohTij erase. Round A 
consisted of five cycles at 94*C for 1 min. 25*C for 1.5 min, 
2S-72*C over 7 min, and 72*C for 3 min using Taq poly- 
merase rEMBj. For round B, the reaaion volume was 
brought up to 100 mJ for a final concentration of 2 of 
primer B fGCTATCTTCAj\GATCA), 200 hm dSTPs. and 4 
units of Taq pohin erase. Round B consisted of 30 cycles of 
for 1 min, 56*C for 2 min, and 72*C for 3 min. The 
amplihcation was performed in 96-well plates using crude 
phage lysates as the templates, resulting in an amplihca- 
tion of both the 3S-kb x veaor and the S-kb to IS-kb yeast 
insert sequences as a distribution of PGR produas between 
250 bp and 1500 bp in length. 

The PGR products were purified and transfencd into 
TE flO mM Tris, 1 mM EDTA at pH S.Oj buffer using Sepha- 
dex G50 gel filtration (Pharmacia) and e\'aporated to dry- 
ness at room temperature overnight. Lach of the 864 am- 
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Figure 3 The layout of the arraying machine. All motions are under computer 
control. For more details of the arraying machine, see web page http:// 
cmgm.stanford.edu/pbrown. 



plified X clones nas rchydrated in 15 nl of 3x SSC 
(20 X SSC « 3 M NaQ. 0.3 u Na^ dnratej in preparation for 
sponing onto the glass under normal room temperature 
conditions. 



Preparation of DNA Microarrays 

The miaoarran were fabncaied on poiy-i-lysme coated 
miCTOSCope slides (Sigma). A custom-built ana\ing ma- 
chine, consisting of four tweczer-hke printing tips 
nrjounted 9 mm apan on a computeT<ontro!led robotic 
stage (Shalon 1996). loaded 1 ^1 of the concentrated PGR 
produn directly from corresponding clusters of four wells 
of 96-well storage plates and deposited -S ni of each 
sample onto each of 40 slides. Surface tension loaded the 
sample into the printing tip directly from the microwell 
plate and held the sample in the tip dunng the printing 
operation. Printing n-as achie\'ed b>* lightly tapping the tip 
against the glass surface. The open-capillar>* design al- 
lowed for rapid nnsing and drying of the tips between 
samples. Figure 3 shon-s the layout of the arraving ma- 
chine. Figure 4 shows a detailed view of the four printing 
tips and the staggered printing panem on the microscope 
slides. Adjacent samples were spotted 380 n-m apan on the 
slides. Aher each set of four samples was printed onto 40 
slides, the printing tips were rinsed with a let of water for 
2 sec and then dried by lowering the tips onto a sponge for 
2 sec. The process was repeated for all 864 samples and 
eight control spots. 

Aher the spotting operation was complete, the slides 
were rchydrated in a humid chamber at room temperature 
for 2 hr, baked in an 80*C vacuum oven for 2 hr. then 
nnsed in 0.1% sodium dodecvi sulfate (SDS) to remove 
unadsorbed DNA. To reduce nonspecihc adsorption of the 
labeled h\i)hdization probe to the poly-L-l>'sine coated 
glass surface, the slides were treated with succinic anhy- 
dride. One gram of succinic anhydride was dissolved in 
100 ml of l-methyl-2-pmolidinone and then 100 ml of 
0.2 M boric acid tpH 8.0) was 
added. The arrays were soaked in 
this solution for 10 min and then 
rinsed in distilled water four 
times for S min each. Immedi- 
ately before use, the arrayed DNA 
elements were denatured by plac- 
ing the slide in distilled water at 
90*C for 2 min. 



Amplification and Labeling 
of Hybridization Probe 

The 16 chromosomes of Saccharxh 
myces crrrvisiae were separated us- 
ing a contour-clamped homoge- 
neous elenric field (CHEF) aga- 
rose gel apparatus (Bio-Rad) (Chu 
et al. 1986). The 6 largest chromo- 
somes were isolated in one gel 
slice and the smallest ten chro- 
mosomes in a second gel slice. 
The DNA from each slice was re- 
covered using a gel extraaion kit 
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Microscof>c slide 

Figure 4 A close-up view of the four open- 
capillary printing tips. The tips are 9 mm apart and 
fit into four adjacent wells of a standard microwell 
plate and print arrays in a staggered fashion on mi- 
croscope slides. For more details of the printing tips, 
see web page http://cmgm.stanford.edu/pbrown. 



(Qiagcn) and randomly amplified in a manner similar to 
that used tn amplifying the target x clones (Grothues et al. 
1993). The main difference between this amplification 
procedure and the one used for the X array elements is a 
f^'cration step between rounds A and 6 to remove primer- 
c;siers and the use of a random 9-mcr 3' end on primer A. 
Foilowing ampiification, 2.5 jig of each of the amplified 
cnromosome pools were separately random-primer labeled 
using Klcnow polymerase (Amersham) %vith a lissamine- 
conjugated nucleotide analog (EhjPont .VEN) for the pool 
containing the 6 largest chromosomes and with a fiuores- 
cem-conjugated nucleotide analog (BMB) for the pool con- 
i^.nmg the smallest 10 chromosomes. The two fluores- 
cent-labeled pools were mixed and concentrated using an 
ultrafiluation device (Amicon). 



Hybridization 

Five miaograms of the hybridization probe, consisting of 
bciT: chromosome pools in 7.5 >d of TI, was denatured in 
a tX3iling water bath and then snap-cooled on ice. Concen- 
trated hybndization solution (2.5 mJ) was added to a final 
concentration of 5x SSC/0.1% SDS. The entire 10 mJ of 
probe solution was transferred to the array surface, covered 
with a coverslip, placed in a cxistom-buiit single-slide hu- 
midity chamber, and incubated in a 60*C water bath for 12 
hr. The custom-built waterproof slide chamber has a cavity 
just slightly bigger than a miaoscope slide and was kept at 
100% humidity internally by the addition of 2 nJ of water 
in a comer of the chamber. The slide was rinsed in 5 x 
SSC/0.1% SDS f r 5 min and then in 0.2x SSC/0.1% SDS 
for 5 min. All rinses were at room temperature. The array 
was then air dried, and a drop of antifade (Molecular 
Probes) was applied t the anay under a 24-mm x 30-mm 
coverslip in preparation f r scanning. 



Deceaion and Analysis 

A custom-built laser scanner was used t detea the two- 



color fluorescence hybridization signals fr m KS- 
cm X i.8<rn arra>3 at ZO^tun resolution. The glass sub- 
strate slide was mounted on a computer-controlled, two- 
axis translation stage (PM-500. \cH-port. Irvine. CA) that 
scanned the array over an upward-facing microscope ob-^ 
jeaive (20 x. 0.75NA Fluor. .Vikon. Melville. M") m a bi- 
directional raster pattern. A watcr<ooled .'Ugon/Ki>'pton 
laser (Innova 70 Spearum. Coherent. Palo .Aito. CA). op- 
erated m multiline mode, allowed for simultaneous spea- 
men illumination at 488.0 nm and 568.2 nm. These two 
lines were isolated by a 488/568 duaJ-band excitation filter 
(Chroma Technology. Brattleboro. VT). .An cpifluores- 
cence configuration with a dual-band 488/568 pnmar> 
beam sphner (Chroma) excited both fluorophores simul- 
taneously and directed fluorescence emissions toward the 
two-channel deteaor. Emissions were split by a secondare' 
dichroic mirror with a 565 transition wavclerigth onto two 
multialkali cathode photomuitiplier tubes (PMT; R928. 
Hamamauu, Bridgewater, NJ), one with an HQS3S/S0 
bandpass barrier filter and the other *vith a D630/60 band- 
pass barrier filter (Chroma). PrcampUfied PMT signals were 
read into a personal computer using a 12-bit analog-to- 
digital conversion board (RTl-834, Analog Devices, Nor- 
wood, MA), displayed in a graphia %vindow, and stored to 
disk for hirther rendering and analysis. The back aperture 
of the 20 X objective was deliberately underfilled bv the 
illuminating laser tjeam to produce a large-diameter Illu- 
minating spot at the specimen (S-n-m to 10-nm half- 
width). Stage scanning velocity was 100 mm/sec. and PMT 
signals were digitized at 100 mlscc intervals. Two successive 
readings were summed for each pixel such that pixel spac- 
ing in the final image was 20 jun. Beam power at the 
specimen was -5 mW for each of the two lines. 

The scanned image was despeckJed using a graphics 
program (Hijaak Graphics Suite) and then analyzed using 
a custom image gridding program that created a spread- 
sheet of the average red and green hybridization intensi- 
ties for each spot. The red and green hybridization intcn- 
sities were coaeacd for optical cross talk between the fluo- 
rescein and lissamine channels, using experimcntallv 
determined cocfficienu. 
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abstract cDNA microarray technology is used to profUe 
complex diseases and diKover novel disease-related genes. In 
inflammatory disease such as rheumatoid arthritis, expression 
patterns of diverse cell types contribute to the pathology. We 
have monitored gene expression in this disease state with a 
microarray of selected human genes of probable significance in 
inflammation as well as with genes expressed in peripheral 
human blood cells. Messenger RNA from cultured macrophages, 
chondrocyte cell lines, primary chondrocytes, and synoviocytes 
provided expression profiles for the selected cytokines, chemo- 
kines, DNA binding proteins, and matrix-degr^ing metal- 
loproteinases. Comparisons between tissue samples of rhetmia- 
toid arthritis and inflammatory bowel disease verified the in* 
volvement of many genes and revealed novel participation of the 
cytokine interleukin 3, chemokine Groa and the metal- 
loproteinase matrix metaJlo-elastase in both diseases. From the 
peripheral blood library, tissue inhibitor of metalloproteinase 1, 
ferritin hght chain, and manganese superoxide dismutase genes 
were identified as expressed differentially in rheimiatoid arthri- 
tis compared with inflammatory bowel disease. These results 
successfully demonstrate the use of the cDNA micnuirrBy system 
as a general approach for dissecting human diseases. 



The recently described cDNA microarray or DNA-chip tech- 
nology allows expression monitoring of hundreds and thou- 
sands of genes simultaneously and provides a format for 
identifying genes as well as changes in their activity (l« 2). 
Using this technology, two-color fluorescence patterns of 
differential gene expression in the root versus the shoot tissue 
of Arabidopsis were obtained in a specific array of 48 genes (1 ). 
In another study using a 1000 gene array from a human 
peripheral blood library, novel genes expressed by T cells were 
identified upon heat shock and protein kinase C activation (3). 

The technology uses cDNA sequences or cDNA inserts of a 
library for PGR amplification thai are arrayed on a glass slide with 
high speed robotics at a density of 1000 cDNA sequences per cm^. 
These microarrays serve as gene targets for hybridization to 
cDNA probes prepared from RNA samples of cells or tissues. A 
two-color fluorescence labeling technique is used in the prepa- 
ration of the cDNA probes such that a simultaneous hybridization 
but separate detection of signals provides the comparative anal- 
ysis and the relative abundance of specific genes expressed (1,2). 
Microarrays can be construaed from specific cDNA clones of 
interest, a cDNA library, or a selea number of open reading 
frames from a genome sequencing database to allow a large-scale 
functional analysis of expressed sequences. 
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Because of the wide spcarum of genes and endogenous 
mediators involved, the microanay technology is well suited 
for analyzing chronic diseases. In rheumatoid arthritis (RA), 
inflammation of the joint is caused by the gene products of 
many different cell types present in the synovium and cartilage 
tissues plus those infiltrating from the circulating blood. The 
autoimmune and inflammatory nature of the disease is a 
cumulative result of genetic susceptibility factors and multiple 
responses, paracrine and autocrine in nature, from macro- 
phages, T cells, plasma cells, neutrophils, synovial fibroblasts, 
chondrocytes, etc. Growth factors, inflammatory cytokines 
(4), and the chemokines (5) are the important mediators of this 
inflammatory process. The ensuing destruction of the cartilage 
and bone by the invading synovial tissue includes the actions 
of prostaglandins and leukotrienes (6), and the matrix degrad- 
ing metalloproteinases (MMPs). The MMPs are an important 
class of Zn-dependent metallo-cndoprotcinases that can col* 
lectively degrade the proteoglycan and collagen components of 
the connective tissue matrix (7). 

This paper presents a study in which the involvement of 
select classes of molecules in RA was examined. Also inves- 
tigated were 1000 human genes randomly selected from a 
peripheral human blood cell library. Their differential and 
quantitative expression analysis in cells of the joint tissue, in 
diseased RA tissue and in inflammatory bowel disease (IBD) 
tissues was conducted to demonstrate the utility of the mi- 
croarray method to analyze complex diseases by their pattern 
of gene expression. Such a survey provides insight not only into 
the underlying cause of the pathology, but also provides the 
opportunity to selectively target genes for disease intervention 
by appropriate drug development and gene therapies. 

METHODS 

Microarray Design, Development, and Preparation. Two ap- 
proaches for the fabrication of cDNA microarrays were used in 
this study. In the first approach, known human genes of probable 
significance in RA were identified. Regions of the clones, pref- 
erably I kb in length, were selected by their proximity to the 3' end 
of the cDNA and for areas of least identity to related and 
repetitive sequences. Primers were synthesized to amplify the 
target regions by standard PGR protocols (3). Products were 



Abbreviations: RA. rheumatoid arthritis; MMP, matrix-degrading 
mctalloproieinase: IBD. inflammatory bowel disease; LPS. lipopoly- 
sacchahdc; PMA. phorbol 12-myristate 13- acetate; TNF-a, tumor 
necrosis factor a; IL, interleukin; TGF-3, transforming growth factor 
3; GCSF, granulocyte colony-stimulating factor; MIP, macrophage 
inflammatory protein; MIF. migration inhibitory factor; HME, human 
matrix metallo-elastasc; RANTES. regulated upon activation, normal 
T cell expressed and secreted; Gel. gelattnase; VCAM, vascular cell 
adhesion molecule; ICE. IL-1 converting enzyme; PUMP, putative 
metalloproteinase; MnSOD, manganese superoxide dismutase; TIMP, 
tissue inhibitor of metalloproteinase; MCP, macrophage chcmotactic 
protein. 
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verified by gel elcarophorcsis and purified with Oiaquick 96-weII 
purification kit (Qiagcn. Chatsworth, CA). Jyophiiizcd (Savant), 
and rcsuspcndcd in 5 ^1 of 3 x standard saline citrate (SSC) buffer 
for arraying. In the second approach, the microarray containing 
the 1056 human genes from the peripheral blood lymphocyte 
library was prepared as described (3). 

Tissue Specimens. Rheumatoid synovial tissue was obtained 
from patients with laie stage classic RA undergoing remedial 
synovectomy or arthroplasty of the knee. Synovial tissue was 
separated from any associated connective tissue or fat. One 
gram of each synovial specimen was subjected to RNA extrac- 
tion within 40 min of surgical excision, or explants were 
cultured in serum-frcc medium to examine any changes under 
in vitro conditions. For IBD. specimens of macroscopically 
inflamed lower intestinal mucosa were obtained from patients 
with Crohn disease undergoing remedial surgery. The hyper- 
trophied mucosal, tissue was separated from underlying con- 
nective tissue and extracted for RNA. 

Cultured Cells. The Mono Mac-6 (MM6) monocytic cells 
(8) were grown in RPMI medium. Human chondrosarcoma 
SW1353 cells, primary human chondrocytes, and synoviocytes 
(9. 10) were cultured in DMEM; all culture media were 
supplemented with 10% fetal bovine serum, 100 Mg/nil strep- 
tomycin, and 500 units/ml penicillin. Treatment of cells with 
lipopolysaccharide (LPS) endotoxin at 30 ng/ml, phorbol 
12-myristate l3-acetate (PMA) at 50 ng/ml, tumor necrosis 
factor a (TNF-a) at 50 ng/ml, intcrleukin (IL>-l/3 at 30 ng/ml, 
or transforming growth factor-^ (TGF-P) at 100 ng/ml is 
described in the figure legends. 



Fluorescent Probe, Hybridization, and Scanning. Isolation of 
mRNA, probe preparaiioa and quantitation with Arabtdopsis 
control mRNAs was esscniially as described (3) except for the 
following minor modification. Following the reverse transcriptase 
step, the appropriate C\3- and Cy5-labcled samples were pooled: 
mRNA degraded by hcatine the sample to 65*C for 10 min with 
the addition of 5 ^ of 0 JM NaOH plus 03 ml of 10 mM EDTA. 
The pooled cDNA was purified from unincorporated nucleotides 
by gel filtration in Centri-spin columns (Princeton Separations, 
Adclphia, NJ). Samples were lyophilized and dissolved in 6 ^ of 
hybridization buffer (5x SSC plus 029c SDS). Hybridizations, 
washes, scanning, quantitation procedures, and pseudocolor rep- 
resentations of fluorescent images have been described (3). Scans 
for the two fluorescent probo were normalized either to the 
fluorescence intensity of Ambidopsis mRNAs spiked into the 
labeling reaaions (see Figs. 2-4) or to the signal intensity of 
^-actin and glyceraldehyde-3-phosphate dehydrogenase 
(GAPDH: sec Fig. 5). 

RESULTS 

Ninety-Six-Gene Microarray Design. The anions of cytokines, 
growth faaors, chemokines, transcription faaors, MMPs, pros- 
taglandins, and leukotrienes are well recognized in inflammatory 
disease, particularly RA (11-14). Fig. 1 displays the seteaed genes 
for this study and also includes control cDNAs of housekeeping 
genes such as 0-actin and GAPDH and genes from Ambidopsis 
for signal normalization and quantitation (row A. columns 1-12). 

Defming Microarray Assay Conditions. Different lengths and 
concentrations of target DNA were tested by arraying PGR- 
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amplified products ranging from 02 to \2 kb ai concentrations 
of 1 ^ig/^I or less. No significant difference in the signal levels was 
observed within this range of target size and only with 02-kb 
length was a signal reduced upon an 8-fold dilution of the 1 Mg/pJ 
sample (data not shown). In this study the average length of the 
targets was 1 kb. with a few exceptions in the range of -300 bp, 
arrayed at a concentration of 1 Mg/M^- Normalfy one PCR pro- 
vidcd sufficient material to fabricate up to 1000 micxoanay targets. 

In considering positional effects in the development of the 
targets for the microarrays. selection was biased toward the 3' 
proximal regions, because the signal was reduced if the target 
fragment was biased toward the 5' end (data not shown). This 
result was anticipated since the hybridizing probe is prepared by 
reverse transcription with oligo{dT>-primed mRNA and is richer 
in 3' proximal sequences. Cross-Jtybridizations of probes to 
targets of a gene family were analyzed with the matrix metal- 



loproteinascs as the example becaus e they can show regions oi 
sequence identities of greater than 70%. With coilaccnasc-1 
(Col-1 ) and collagenasc-2 (Col-2) genes as targets with up lo 70^ 
sequence identity, and stromelysin-1 ( Strom- 1 ) and stromeh-sin-Z 
(Strom-2) genes with differeni degrees of identit>. our results 
showed that a short region of overlap. e\-cn with 70-909r se- 
quence identity, produced a low level of cross- hybridization. 
However, shorter regions of identit)* spread over the length of the 
target resulted in cross-hybridization (data not shwn). For 
closely related genes, targets were designed by avoiding long 
stretches of homology. For members of a gene famiK rwo or more 
target regions were included to discriminate between spccificit>: 
of signal versus cross-hybridization. 

Monitoring DifTerential Expression in Cultured Cell Lines. In 
RA tissue, the monocyte/macrophage population plays a prom- 
inent role in phagocytic and immunomodulatory activities. TNp- 
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Fig. 2. Time course for LPS/PM A-induccd MM6 cells. Array elemcnis are described in Fig. \.{A) Pseudocolor represcniaiions of fluorescent 
scans correspond lo gene expression levels ai each lime point. The array is made up of 8 Arabidopiis conirol targets and 86 human cDN A largeis, 
the majority of which arc genes with known or suspcacd involvement in inflammation. The color bars provide a comparative calibration scale 
between arrays and arc derived from the Arabidopsis mRNA samples that arc introduced in equal amounts during probe preparation. Fluorescent 
probes were made by labeling mRNA from untreated MM6 cells or LPS and PMA treated cells. mRNA was isolated at indicated times after 
induction. (B /-///) The two-color samples were cohybridized. and microarray scans provided the data for the levels of select transcripts at different 
time points relative to abundance at time zero. The analysis was performed using normalized data collected from 8-bti images. 
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ically these cells, when triggered by an immunogen, produce the 
proinflammatroy cytokines TNF and IL-1. We have used the 
monocyte cell line MM6 and monitored changes in gene expres- 
sion upon activation with LPS endotoxin, a component of Gram- 
negative baacriai membranes, and PMA, which augments the 
anion of LPS on TNF produnion (15). RNA was isolated at 
different times after induciion and used for cDNA probe prep- 
aratioa From this time course it was dear that TNF expression 
was induced within 15 min of treatment, reached maximum levels 
in 1 hr, remained high until 4 hr and subsequently declined (Fig. 
2A), Many other cytokine genes were also transiently aaivaicd, 
such as IL-1 a and -0, 11^6. and granulocyte colony-stimulating 
factor (GCSF). Prominent chemokines activated were IL-S, mac- 
rophage inflammatory protein (MIP)-ip, more so than KflP-la, 
and Groo or melanoma growth stimulatory factor. Migration 
inhibitory factor (MIF) expressed in the uninduced state declined 
in LPS-aaivaied cells. Of the immediate early genes, the notice- 
able ones were c-fos,fro-L c-jun, NF-KBp50. and IkB, with c-rrl 
expression observed even in the uninduced state (Fig. 2B), These 
expression patterns are consistent with reported patterns of 
activation of ccnain LPS- and PMA-induccd genes (12). Dem- 
onstrated here is the unique ability of this system to allow parallel 
visualization of a large number of gene aaiviiies over a period of 
lime. 

SW1353 cells is a line derived from malignant tumors of the 
cartilage and behaves much like the chondrocytes upon stim- 
ulation with TNF and IL-1 in the expression of MMPs (9). In 
addition to confirming our earlier observations with Northern 
blots on Strom- 1, Col-L and CoI-3 expression (9), gelaiinase 
(Gel) A, putative metalloproteinase (PUMP)-l mcmbranc- 




Fig. 3. Time coune for IL-13 and TNF-induccd SW1353 cells 
using the inflammation anay (Fig. I). {A) Pseudocolor representation 
of fluorescent scaiis concspond to gene expression levels at each lime 
point. {B l-IV) Relative levels of seleaed genes at different time poinu 



type matrix metalloproteinase, tissue inhibitors of matrix 
meialloproteinases or tissue inhibitor of metalloproteinase 1 
(TIMP-1). -1 and -3 were also expressed by these cells together . 
with the human matrix meiallo-elastasc (HME: Fig. lA ).'HME 
induciion was estimated lo be •*50-fold and was greaier than 
any of the other MMPs examined (Fig. 3B). This result was 
unexpected because HME is reportedly expressed only by 
alveolar macrophage and placental cells (16). Expression of 
the cytokines and chemokines. IL-6, IL-8. MIF. and MIP-10 
was also noted. A variety of other genes, including certain 
transcription factors, were also up-regulated (Fig. 3). but the 
overall time-dependent expression of genes in the SW1353 
cells was qualitatively distinct from the MM6 cells. 

Ouantiiation of differential gene expression (Figs. IB and 
IB) was achieved with the simultaneous hybridization of 
Cy3-Iabeled cDNA from untreated cells and Cy5-labeled 
cDNA from treated samples. The estimated increases in 
expression from these microarravs for a select number of genes 
including IL-1)3. IL-8, MIP-lj3, TNF. HME. CoM. Col-3. 
Strom-l, and Slrom-2 were compared with data collected from 
dot blot analysis. Results (not shown) were in close agreement 
and confirmed our earlier observations on the use of the 
microarray method for the quantitation of gene expression (3). 

Expression Pronies in Primary Cbondrocyies and Synovio- 
cytes of Human RA Tissue. Given the sensitivity and the 
specificity of this method, expression profiles of primary 
synoviocytes and chondrocytes from diseased tissue were 
examined. Without prior exposure to inducing agents, low level 
expression of c-;un, GCSF, IL-3, TNF-^. MIF. and RANTES 
(regulated upon activation, normal T cell expressed and se- 
creted) was seen as well as expression of MMPs, GelA, 
Strom-l, CoI-1. and the three TIMPs. In this case, Col-2 
hybridization was considered to be nonspecific because the 
second Col-2 target taken from the 3' end of the gene gave no 

A. Human synoviitl lit>roDUst» B. Human articular chondrocytes 




Fig. 4. Expression profiles for early passage primary synoviocytes and 
chondrocytes isolated from RA tissue, cultured in the presence of 10% 
fcial calf scrum and activated with PMA and IL-13. or TNTF and IL-13. 
or TGF-3 for 18 hr. The color bars provide a comparative calibration scale 
between arrays and arc derived from \htArabuiopsu mRNA samples that 
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signal. Trcaiment more so with PMA and IL-1, than TNF and 
IL-1, produced a dramatic up-regulation in expression of 
several genes in both of these primary cell types. These genes 
are as follows: the cytokine IL-6, the chemokines IL-8 and 
Gro-la, and the MMPs; Strom-1, CoI-1, CoI-3, and HME: and 
the adhesion molecule, vascular cell adhesion molecule 1 
(VCAM-1). The surprise again is HME expression in these 
primary cells, for reasons discussed above. From these results, 
the expression profiles of synoviocytes and the chondrocytes 
appear very similar: the differences are more quantitative than 
qualitative. Treatment of the primary chondrocytes with the 
anabolic growth factor TGF-0 had an interesting profile in that 
it produced a remarkable down-regulation of genes expressed 
in both the untreated and induced state (Fig. 4). 

Given the demonstrated effectiveness of this technology, a 
comparative analysis of two different inflammatory disease 
states was conducted with probes made from RA tissue and 
IBD samples. RA samples were from late stage rheumatoid 
synovial tissue, and IBD specimens were obtained from in- 
flamed lower intestinal mucosa of patients with Crohn disease. 
With both the 96-clement known gene microarray and the 
1000-gene microarray of cDNAs selected from a peripheral 
human blood cell library (3), distinct differences in gene 
expression patterns were evident. On the 96-gcnc iarray, RA 
tissue samples from different affected individuals gave similar 
profiles (data not shown) as did different samples from the 
same individual (Fig. 5). These patterns were notably similar 
to those observed with primary synoviocytes and chondrocytes 
(Fig. 4), Included in the list of prominently up-regulated genes 
arc IL-6, the MMPs Strom-l, Col-1, GelA. HME, and in 
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FiC. 5. Expression profiles of RA tissue {A) and IBD tissue (5). 
mRNA from RA tissue samples obtained from the same individual was 
isolated directly after excision (RA 2UA) or maintained in culture 
without scrum for 2 hr (RA 21 JB) or for 6 hr (RA 21JC). Profiles 
from tissue samples of two other individuals (data not shown) were 
remarkably similar to the ones shown here. IBD-A and IBD-CI are 
from mRNA samples prepared directly after surgery from two sepa- 
rate individuals. For the IBD-CIl probe, the tissue sample was cultured 
in medium without scrum for 2 hr before mRNA preparation. 



certain samples PUMP, TIMPs, particularly TIMP-l and 
TlMP-3, and the adhesion molecule VCAM. Discernible levels 
of macrophage chemotaaic protein 1 (MCP-1), MIF and 
RANTES were also noted. IBD samples were in comparison, 
rather subdued although IL-1 converting enzN'me (ICE), 
TlMP-1. and MIF were notable in all the three different IBD 
samples examined here. In IBD-A one of three individual 
samples. ICE, VCAM, Groa. and MMP expression was more 
pronounced than in the others. 

We also made use of a peripheral blood cDNA library (3) 
to identify genes expressed by lymphocytes infiltrating the 
inflamed tissues from the circulating blood. With the 1046- 
elcment array of randomly selected cDNAs from this library, 
probes made from RA and IBD samples showed hybridizations 
to a large number of genes. Of these, many were common 
between the two disease tissues while others were differentially 
expressed (data not shown). A complete survey of these genes 
was beyond the scope of this study, but for this report we 
picked three genes that were up-regulated in the RA tissue 
relative to IBD. These cDNAs were sequenced and identified 
by comparison to the GenBank database. They are TIMF-1, 
apoferritin light chain, and manganese superoxide dismutase 
(MnSdD). Differential expression of MnSOD was only ob- 
served in samples of RA tissue explanis maintained in growth 
medium without serum for anywhere between 2 to 16 hr. These 
results also indicate that the expression profile of genes can be 
altered when explanis are transferred to culture conditions. 

DISCUSSION 

The speed, case, and feasibility of simultaneously monitoring 
differential expression of hundreds of genes with the cDNA 
microarray based system (1-3) Is demonstrated here in the 
analysis of a complex disease such as RA. Many different cell 
types in the RA tissue; macrophages, lymphocytes, plasma cells, 
neutrophils, synoviocytes, chondrocytes, etc. are known to con- 
tribute to the development of the disease with the expression of 
gene products known to be proinflammatory. They include the 
cytokines, chemokines, growth factors, MMPs, eicosanoids, and 
others (7, 11-14), and the design of the 96-element known gene 
microarray was based on this knowledge and depended on the 
availability of the genes. The technology was validated by con- 
firming earlier observations on the expression of TNF by the 
monocyte cell line MM6. and of Col-1 and Col-3 expression in the 
chondrosarcoma cells and articular chondrocytes (9. 12). In our 
time-dependent survey the chronological order of gene activities 
in and between gene families was compared and the results have 
provided unprecedented profiles of the cytokines (TW, 11^ 1, 
IL-6. GCSF. and MIF). chemokines (MlP-la, MIP-1/3, IL-8. and 
Gro-1). cenain transcription factors, and the matrix metal- 
loprotcinascs (GelA Strom-l. Col- 1, CoI-3. HME) in the mac- 
rophage cell line MM6 and in the SW1353 chondrosarcoma cells. 

Earlier reports of cytokine production in the diseased state had 
established a model in which TNF is a major participant in R A 
Its expression reportedly preceded that of the other cytokines and 
effector molecules (4). Our results strongly support these results 
as demonstrated in the time course of the MM6 cells where TNF 
induction preceded that of IL-la and IL-P followed by IL-6 and 
GCSF. These expression profiles demonstrate the utility of the 
microarrays in determining the hierarachy of signaling events. 

In the SW1353 chondrosarcoma cells, all the known MMPs and 
TlMPs were examined simultaneously. HME expression was 
discovered, which previously had been observed in only the 
stromal cells and alveolar macrophages of smoker's lungs and in 
placental tissue. Its presence in cells of the RA tissue is mean- 
ingful because its' activity can cause significant destruaion of 
elastin and basement membrane components ( 16, 17). Expression 
profiles of synovial fibroblasts and articular chondrocytes were 
remarkably similar and not too different from the SW1353 cells, 
indicating that the fibroblast and the chondrocyte can play equally 
aggressive roles in joint erosion. Prominent genes expressed were 
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the MMPs, but chemokines and cytokines were also produced by 
these cells. The effect of the anabolic growth factor was 
profoundly evident in demonstrating the down regulation of these 
catabolic activities. 

RA tissue samples undeniably refleaed profiles similar to 
the cell types examined. Active genes observed were IL-3. IL-6, 
ICE, the MMPs including HME and TlMPs. chemokines IL-8, 
Groa, MIP, MIF. and RANTES, and the adhesion molecule 
VCAM. Of the growth factors, fibroblast growth fartor P was 
observed most frequently. In comparison, the expression 
patterns in the other inflammatory state (i.e., IBD) were not 
as marked as in the RA samples, at least as obtained from the 
tissue samples selected for this study. 

As an alternative approach, the 1046 cDNA microarray of 
randomly scleacd genes from a lymphocyte library was used to 
identify genes expressed in RA tissue (3). Many genes on this 
array l^bridized with probes made from both RA and IBD tissue 
samples. The results are not surprising because inflammatory 
tissue is abundantly supplied with cell types infiltrating from the 
circulating blood, made apparent also by the high levels of 
chcmokinc expression in RA tissue. Because of the magnitude of 
the effort required to identify all the hybridized genes, we have for 
this report chosen to describe only three differentially expressed 
genes mainly to verify this method of analysis. 

Of the large number of genes observed here, a fair number 
were already known as active participants in inflammatory dis- 
ease. These arc TNF, IL-1, IL-6, IL-8, GCSF. RANTES, and 
VCAM. The novel participants not previously reported arc 
HME, IL-3, ICE, and Groa. With our discovery of HME 
expression in RA. this gene becomes a target for drug interven- 
tion. ICE is a cysteine protease well known for its IL-1 P process- 
ing activity (18), and recognized for its role in apopiotic cell death 
(19). Its expression in RA tissue is intriguing. IL-3 is recognized 
for its growth-promoting aaivity in hematopoietic cell lineages, is 
a product of aaivated T cells (20), and its expression in synovio- 
cytes and chondrocytes of RA tissue is a novel observatioa 

Like 11^8. Groa, is a C-X-C subgroup chcmokine and is a 
potent neutrophil and basophil chemoattraaant It down- 
regulates the expression of types I and Ul interstitial collagens 
(21, 22) and is seen here produced by the MM6 cells, in primary 
synoviocytes, and in RA tissue. With the presence of RANTES, 
MCP, and MIP-ip, the C-C chemokines (23) migration and 
infiltration of monocytes, particularly T cells, into the tissue is 
also enhanced (5) and aid in the trafficking and recruitment of 
leukocytes into the RA tissue. Their activation, phagocytosis, 
degranutation, and respiratory bursts could be responsible for 
the induction of MnSOD in RA. MnSOD is also induced by 
TNF and IL-1 and serves a protective function against oxida- 
tive damage. The induction of the ferritin light chain encoding 
gene in this tissue may be for reasons similar to those for 
MnSOD. Ferritin is the major intracellular iron storage protein 
and it is responsive to intracellular oxidative stress and reactive 
oxygen intermediates generated during inflammation (24, 25). 
The active expression of TlMP-1 in RA tissue^ as detected by 
the 1000-element array, is no surprise because our results have 
repeatedly shown TIMP-1 to be expressed in the constitutive 
and induced states of RA cells and tissues. 

The suitability of the cDNA microarray technology for 
profiling diseases and for identifying disease related genes is 
well documented here. This technology could provide new 



targets for drug development and disease therapies, and in 
doing so allow for improved treauneni of chronic diseases that 
are challenging because of their complexir>'. 
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A number of methods have been developed to quantitate, measure the size of, 
and map the 5' and 3' termini of specific mRNA molecules in preparations of 
cellular RNA. These include: 

• Northern hybridization (RNA blotting), in which the size and amount of 
specific mRNA molecules in preparations of total or poly (A)* RNA are 
determined (Alwine et al. 1977, 1979). The RNA is separated according to 
size by electrophoresis through a denaturing agarose gel and is then 
transferred to activated cellulose (Alwine et al. 1977; Seed 1982b), nitro- 
cellulose (Goldberg 1980; Thomas 1980; Seed 1982a), or glass or nylon 
membranes (Bresser and Gillespie 1983) (see below). The RNA of interest 
is then located by hybridization with radiolabeled DNA or RNA followed by 
autoradiography. 

• Dot and slot hybridization, in which an excess of radiolabeled probe is 
hybridized to RNA that has been immobilized on a solid support (Kafatos et 
al. 1979; Thomas 1980; White and Bancroft 1982). Densitometric tracings 
of the resulting autoradiographs can allow comparative estimates of the 
amount of the target sequence in various preparations of RNA. 

• Mapping RNA using nuclease SI or ribonuclease, in which the precise 
positions of the 5' and 3' termini of the mRNA and the locations of splice 
junctions can be rigorously determined (Berk and Sharp 1977; Weaver and 
Weissmann 1979). Labeled or unlabeled RNA or DNA probes derived from 
various segments of the genomic DNA are hybridized to mRNA, often under 
conditions favoring the formation of DNA:RNA hybrids (Casey and David- 
son 1977). The products of the hybridization are then digested with 
nuclease Si or M^Aase under conditions favoring digestion of single- 
stranded nucleic acids only. Analysis of the digestion products by gel 
electrophoresis yields important quantitative and qualitative information 
about the mRNA structure. 

• Primer extension, in which a small radiolabeled fragment of DNA is 
hybridized to the mRNA and used as a primer for reverse transcriptase. 
The resulting product should extend to the extreme 5' terminus of the 
mRNA, and thus the size of the product reflects the number of nucleotides 
from the position of the label to the 5' terminus of the mRNA. 

• Solution hybridization, in which the absolute concentration of the sequence 
of interest is calculated from the rate of hybridization of a small amount of 
a specific radioactive probe with a known quantity of purified cellular RNA 
(see, e.g., Roop et al. 1978; Dumam and Palmiter 1983). Alternatively, an 
excess of a radiolabeled probe is incubated with a known amount of RNA. 
The concentration of the RNA of interest can then be estimated from the 
amount of radioactivity that becomes resistant to nuclease Si (see, e.g., 
Favaloro et al. 1980; Beach and Palmiter 1981; Williams et al. 1986). 
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• Filter hybridization, in which purified cellular RNA is end-labeled with ^^p 
and hybridized to a large excess of the homologous DNA that has been 
immobilized on a soHd support (Williams et al. 1986). 

Below we describe northern hybridization. Dot and slot hybridization of 
both crude and purified preparations of RNA are described beginning on page 
7.53; nuclease-Sl and RNAase analysis of specific hybrids, beginning on 
pages 7.58 and 7.71, respectively; and analysis of mRNA by primer extension, 
beginning on page 7.79. 
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ABSTRACT Microarrays cootainiag 1046 human cDNAs 
of unknown sequence were printed oo glass with high-speed 
robolics. These I.O-cm^ DNA -chips" were used to quantita- 
tively monitor differential expression of the cognate human 
genes using a highly sensitive two-color hybridization assav 
Array elements that displayed differential expression patterns 
under given experimental conditions were characterized by 
sequencing. The identincation of known and novel beat shock 
and phorbol ester- regulated genes in human T cells demon- 
strates the sensitivity of the assay. Parallel gene analvsis with 
microarrays provides a rapid and efficient method for large- 
scale human gene discovery. 

Biology has enicrcd ihe genome era (1). Compicie genome 
sequences for all of the model organisms and human will 
probably be available by the year 2003 (2). Torrents of human 
expressed sequence lags (ESTs) provide a starting point for 
elucidaimg the function of tens of thousands of cognate genes 
(3). Genome analysis will provide insiehis inio growth, devel- 
opment, differentiation, homeostasis, agine. and the onset of 
diseases (1-3). A detailed understanding of the human eenome 
will require the implementation of sophisticated methods for 
gene expression analysis and gene discovery. 

Recently, a microarray-based method for high-throughput 
monitormg of plant gene expression was described (4)? This 
"chip '-based approach involved using microarravs of cDNA 
clones as genc-specific hybridization targets to quaniitaiiveiv 
measure expression of the corresponding plant genes (4 5) A 
two-color fluorescence labeling and dcteaion scheme 'facili- 
tated sensitive differential expression analvsis of different 
plant tissues (4. 5). The efficiency of this approach for studies 
in higher plants suggested the use of this method for human 
genome analysis (4-7). Here, we report the use of cDNA 
microarrays for human gene expression monitoring, biological 
mvcsiigation. and gene discovery. 

iMATERlALS AND METHODS 
Human cDNA Clones. The cDNA library was made with 
mRNA from human peripheral blood Ivmphocvtcs trans- 
formed with the Epstein-Barr virus. Inserts >66o bp were 
cloned into the lambda vector AYES-R to generate lO'-lO* 
recombinanis. Bacterial transformants were obtained bv in- 
fecting £. coli strain JM107/AKC Colonics were picked at 
random and propagated in a 96-well formal, and minilvsate 
DNA was prepared by alkaline Ivsis using REAL preps 
(Oiagcn. Chatsworth. CA). Inserts were amplified bv PCR in 
a 96-weIl format using primers (PANH** 5'-CCTC- 
TATACTTTAACGTCAAGG; and PAN133. 5'.'tTGTGTG. 
GAATTGTGAGCGG) complcmcntarv to the AYES 
polylinker and comaining a six-carbon ariiino modification 

The publication costs of th.i anicic were dcfravcd in pan bv page charge 
payment. This anicIc must therefore be hereby marked Wirmifmrn; ' m 
accordance with 18 U.S.C. $I7?4 solciv lo mdicaic this fact 



(Glen Research. Sterling. VA) on the f end. PCR product, 
were purified in a 96.w-ell format using QlAquick columns 
(0»agen). 

Microarray Preparation. Amino-modificd PCR products 
were suspended at a concentration of 0.5 mc/ml in 3x 
standard saline citrate (SSC) and arraved from 96-well micro- 
liter plates onto silylaied microscope slides (CEL Associates 
Houston) using high-speed roboiics (4-7). A total of 1056 
cDNAs. representing 1046 human clones and 10 Arabidopsts 
controls, were arrayed in 1.0-cm- areas. Printed arravs were 
incubated for 4 hr in a humid chamber to allow rehydration of 
the array elements and rinsed, once in 0.2rr SDS'for 1 min 
twice m H:0 for 1 min. and once for 5 min in sodium 
borohydnde solution (1.0 g of NaBH4 dissolved in 300 ml of 
PBS and 100 mi of lOO^r ethanol).The arravs were submerged 
m H:0 for 2 min at 95^C, transferred quicklv into 0.2^^ SDS 

a725^c'"' ""^^^ ^^'^^^ "^'^'^ ^^^^ 

.^f'jiJv^"'" Probes. Tissue mRNAs were purchased 
(CLONTECH). Jurkat mRNA was isolated as described by 
Schena ei ai (4). Probes were made as described (4) with 
several modifications. The reverse transcriptase used here was 
Superscript 11 RNase H- (GIBCO). The Cv5-dCTP was 
purchased from Amerriham. Each reverse transcription reac- 
tion contained 3.0 Mg of total human miWA: Arabidopsts 
control mRNAs were made by m vitro transcription of cloned 
HAT4. H AT22. and YcsAt-23 cDNAs (4. 8. 9) usinc an RNA 
Transcription Kit (Stratagene). For quantitation, the mRNAs 
were doped into the reverse transcription reaction at ratios of 
1.100.000. 1:10.000. and 1: 1000 (wt/w|) respectively. Following 
the reverse transcription step, samples were treated with *> 5 ^1 
of 1 M sodium hydroxide for 10 min at 3rC. then neutralized 
by addmg 15 ^1 of 1 M Tris-HCl (pH 6.8) and 2.0 of 1 M 
HCl. Probe matures contained cDNA products derived from 
3 ^xs of total mRNA. suspended in 5.0 n\ of hvbridizaiion 
buffer (DX SSC plus 0.2^c SDS). 

Hybridization and Scanning. Probes were hybridized to 
1.0-cm- microarrays under a 14 x 14 mm glass coverslip for 
6-12 hr at 60°C in a custom-built hybridization chamber (4-7) 
Arrays were washed for 5 min at room temperature (25X) in 
low sirmcency wash buffer (Ix SSC/0.29c SDS). then for 10 
mm at room temperature in high stringency wash buffer (0.1 x 
SSC/0.29C SDS). Arrays were scanned in 0.1 x SSC using a 
fiuorescencc laser scanning device (4-7). fitted with a custom 
filler set (Chroma Technology. Brattleboro. VT). Accurate 
differential expression measurements (i.e.. final fluorescence 
ratios) were obtained by taking the average of the ratios of two 
independent hybridizations. 



Abbreviation: EST, expressed sequence lag 
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Cell Culture. Jurkai cells were grown in a lissue culture 
incubator {ZTC and 5^c CO:) in RPMI medium supplemented 
with lO^f feial bovine serum. 100 Mg of streptomycin per ml. 
and 500 units of penicillin per ml. Heat shock corresponded to 
a 4-hr incubation at 43'C. Phorbol esier treated cells were 
grown for 4 hr in the presence of 50 ng of phorbol 12-mvrisiaie 
13-acetaie fPMA) per ml. 

RNA Blotting. Dot blots were performed as described (4). 

DNA Sequencing. Sequences were obtained using the 
PAN132 and PAN133 primers and a 373A automated se- 
quencer, according lo the instructions of the manufacturer 
(Applied Biosysiems). 

Computer Graphics and informatics. Pseudocolor represen- 
tations of fluorescent images were made with National Insiiiuics 
of Health image software (version 1,52). Sofruarc for differential 
expression representations was purchased from Imaging Re- 
search (St, Catherine s. ON. Canada). Sequence searches were 
made to the nonredundani nucleotide data base at ihc National 
Center for Bioiechnoloe>' Information (NCBI) using Macintosh 
BLAST software- The ESt data base was accessed via the World 
Wide Web ( hup:/ ww.ncbi.nim.n ih.gov/). 

RESULTS 

Gene Discover> and the Heat Shock Response. Microarrays 
were used lo examine the heat shock response in cultured 
human T (Jurkat) cells. Control (37*C) and hcai-treaied 
(43''C) cells were harvested and lysed. and total mRNA from 
the TWO cell samples was labeled by reverse transcriptase 
incorporation of fluorescein- and Cy5-dCTP. respectively. In 
a second set of labeling reactions, the fluorescent groups were 
"swapped" such thai samples from control and heat-treated 
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samples were labeled with Cv5- and f luorescein-dCTP. respec- 
lively. Each pair of fluorescent probes was hvbridiicd to a 
1056-clement microarray. The arrays were washed at high 
siringenc>- and scanned with a confocal laser scanninc device 
to detect emission of the iwo fluorescent croups. 

Hybridization signals were observed to >95<7 of the human 
cDNA array elements, but not lo anv of the Arabidopsis 
negative controls (Fig. 1). Fluorescence intensities spanned 
more than three orders of maeniiude for the 1046 arrav 
clementssurvcyed(Fig. 1). Comparative e.xpression analvsisof 
heat shocked versus control cells in the two experiments 
revealed 17 arrav elements that displaved altered fluorescence 
ratios of slO-foId (Figs. 1 and lA). Of the 17 putative 
differentially expressed genes. U were induced bv heal shock 
treatment and 6 displayed modest repression (Fics. 1 and ZAY 

To determine the identity of the hcat-reculatcd penes 
cDNAs corresponding to each of the 17 arrav elements were 
sequenced on the proximal and distal end. Data base searches 
revealed perfect matches for 14 of the 17 clones, and in each 
case proximal and distal cDNA sequences mapped to the same 
gene (Table 1). Of the 1046 human genes examined on the 
microarray. the five most highly induced in heat-treated ccIU 
were heat shock protein 90a (hsp^Oa). dnaJ, hsp903. poK-u- 
biquum. and t-complex poKpcpiide- 1 ( tcp- 1 ) (Table 1 ). Three 
of the 17 clones did not match any entrv in the public data base, 
though one of the ciones (B7) exhibited sicnificant homology 
to an EST from Caenorhabdiiis elegans (Table 1). Each of the 
novel sequences (B7-B9) exhibited -2-fold induction (Table 1) 
and relatively low-lcvel expression (Table 2). 

To confirm the microarray results. mRNA levels for each of 
the genes were measured by RNA blotting. Each of the genes 
that displayed heat shock induction, including the three novel 
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10616 Biochemistry: Schena ei al. 

-/+ Heat Shock 



Proc. Kail. Acad. Set. IS.A 95 il99b) 

/+ Phorbol Ester 




^-<0J& - a 0^ - 2.0 >2.0 



Expression Ratios 

Fig. 2. Elemental displays of activated and repressed genes. Fluorescence ratios of two-color m.croarrav scans (Fic !> are depicted 
Cv^rbrdtrh^^^^ ^'^"^ P^°^^«' treatment were compared with 

o/ "th. r.nn. ^ o h h H " ^ "'""f °^ '^'u^'"* f luorcscent eroups were swapped ,sce text) The data represent the average 

of the ranos irom .wo hybr.dtzat.ons. exciudmg values .n wh.ch the difference of the .wo rat.os was ereater than half the average ratio The color 
bar corresponds to expression ranos. which arc independent of the absolute expression level of a given gene. 

Table 1. Microarray elements corresponding to differentially expressed genes 
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Clone name, array position (Fig. 1). fluorescence ratio, sequence ideniiu. and accsston number ot cDNAs that maniicsted 
a differential expression pattern with probes prepared from heat shock- {81-17) or phorbol csicr-treaicd <Bl8-2^) Jurkai cells 
Cones show,ng >9S9. Identity over .^00 nucleo.idcs were assumed to be identical to known sequences. All genes arc nuclear 
except CYC oxidase III (mitochondrial). Accession numbers ret leci the highest score for proximal and disiai sequence traces 
respectively CYC cvtochrome c: TCP-1. T-complex polypeptide: HSP. heai shock proiem: PGK. phosphoclvccra.e kinase' 
NF-<B. nuclear factor-kappaB: PAC-1. phosphatase of activated cells: and NR. trace not readable due to the presence of 
poly(A)* tract. ^ 
'B7 is 67rc identical to an EST from C e/fjfonj (D76026). 
^No match in the public data bases. 
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sequences, exhibited elevated mRNA levels bvdoi bioi analysis 
(Table 2). In all cases, expression ratios as determined bv'thc 
two procedures differed by <2.fold for the genes identified in 
the heat shock experiments (Table 2). The two assavs differed 
more widely in terms of assessing absolute expression levels- 
nonetheless, absolute expression as monitored on a microarray 
typically correlated with RNA blots to within a factor of five 
(Table 2). 

Phorbol Esier Sifinaling. To explore a signaling pathway 
disunci from the heai shock response, microarravs were used 
to examine the cellular effects of phorbol ester ircatmeni 
Jurkai cells were treated with phorbol ester, harvested ivsed 
and used as a source oi mRNA. Samples of mRNA from' 
untreated or phorbol esicr-stimulaied cells were labeled with 
reverse transcriptase. The probes were mixed, hybridized to 
microarravs. and scanned for fluorescence emission of the two 
fluorescent groups. A total of su arrav elements displayed 
^2.0-fold elevated signals with probes from phorbol ester- 
treated cells relative to control samples (Fig. 2B). 

To determine the identity of the phorbof ester-induced 
genes, clones corresponding to the six arrav elements were 
sequenced. Data base searches revealed perfect matches for 
five of the su sequences (Table 1). The two most highlv 
mduced genes were the PAC l tyrosine phosphatase and 
nuclear factor-kappa Bl {NF-kB}); modest activation was 
observed for phosphoglycerate kinase and ^^microglobulin 
(Table 1). One remaining clone (B19) did not match anv entry 
in the public data base (Table 1). BJ9 displaved a 1-fold 
induction and. similar to the novel heat shock genes a rela- 
tively low absolute expression level (Tables 1 and 2) All su of 
the phorbol cster-inducible genes displayed increased steadv- 
siate mRNA levels l?y RNA blotting (Table 2). PAC-l expres- 
sion (Fig. 1: Table 2) defined a detection limit of -1-500 000 
lor the assay. 

Transcript Imaging in Human Tissues. To determine 
whether microarrays could be used lo monitor expression in 
human iissues. probes were prepared from human bone mar- 



row^ brain prosiaic. and hean by labeling each mRNA sample 
with CyS-dCTP. In a separate reaction, a control probe was 
prepared by labeling Jurkai mRNA with fluoresce in-dtTTP 
The four Cy5.labelcd probes were each mixed with an aliquot 
of the fluorescein-labeled control sample, and the four mix- 
tures were hybridized to separate microarravs. The arrays were 
washed and scanned for fluorescence emission, and hybrid- 
izaiion signals for each of the tissues samples weft normalized 
10 the Jurkat control to generate an expression profile for each 
oi the 1046 clones present on the array. 

Delectable expression was observed for all 15 of the heat 
shock and phorbol csier-regulaied genes in the four tissue 
types examined (Fig. 3). In general, the expression level of each 
gene in Jurkai cells correlated rather closely with expression in 
the four tissues (Table 2: Fig. 3). Genes encodinc 0-aciin and 
cytochrome c oxidase, the two mosi hichlv expressed of the 15 
genes m Jurkai cells (Table 2). were hichlv expressed in bone 
marrow, bra.n. prostate, and heart (Fig.'l-l). Expression of 
cytochrome c oxidase. hsp90o. and the novel B7 sequence was 
significantly greater in heart than in ihe other tissues (Fig. 3). 

DISCUSSION 

Many of the heat shock genes identified in this siudv encode 
??r^^«^ ^^^^ function either as molecular -chaperones" 
(HSP90a. HSP90^. DnaJ. TCP- 1) or as mediators o^^ protean 
degradation (polyubiquitin). The identification of these se- 
quences IS consistent with the biochemical basis of heat shock 
induction (10-15). Proteins undergo denaiuration at cicvaicd 
temperatures, and ihose thai fail lo maintain proper confor- 
maiion must be selectively degraded (10-15). It will be inter- 
esung to determine whether the three novel heat shock- 
inducible sequences (B7-B9) mediate protein folding and 
turrioveror possess some other biochemical activity. Complete 
nucleotide frequence determination, conceptual 'translation 
expression monitoring, and biochemical analysis should pro-' 
vide a detailed functional understanding of these genes 
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Fic. 3. Transcripl profiles of hcai shock and phorbol csier- 
regulaicd penes. Gene expression levels per 100,000 mRNAs (j-axe$) 
arc shown for 15 eencs (Table 1) in human bone marrow (red), brain 
(green), prostate (blue), and heart (yellow). Genes are grouped 
according to expression levels (/4-C). 



Phorbol csier. apoicni activator of proiein kinase C ( 16. 17). 
induced a set of genes distinct from those involved in ihe heat 
shock pathway. The most highly induced gene identified in this 
study. PAC-L encodes a nuclear tyrosine kinase thai may play 
a role in regulating iranscnpiion and cell cycle progression 
(18). NF-kB1. a second phorbol esicr-inducible gene, is an 
intensively studied member of the Rcl transcription factor 
family (19-21). The Rel proteins are activated by a large 
number of stimuli, including phorbol esters. c\'iokincs. bacte- 
rial and viral pathogens, and ultraviolet light (19-21). Modest 
activation was observed for three sequences not known to be 
inducible by phorbol esters, including phosphoglyccrate ki- 
nase. ^:*microgiobulin. and a novel human gene (B19). Ex- 
tensive expression moniiormg with microarrays should assist in 
understanding how each of these genes integrate into the 
highly complex phorbol ester signaling pathway. 

It is striking that four novel human genes were discovered 
with an array of 1000 randomly chosen clones, particularly 
because the heat shock and phorbol ester signaling pathways 
have been so intensively studied (10-21). The facile discovery 
of these sequences underscores the fact that microarrays can 
be used for gene discovery in the absence of any sequence 
information. By this approach, clones are chosen at random 
from any library of interest and only those clones that display 
interesting expression patterns are sequenced and character- 
ized. This parallel assay, coupled with a modest DN A sequenc- 
ing facility, allows high-throughput human genome expression 
analysis and gene discovery. 

Genes that are activated or repressed by a given stimulus 
provide functional clues to the cellular pathway involved 
(22-24). Detailed examination of these gene expression "sig- 
natures'" can provide a dynamic view of the mode of action of 
a given signaling substance (22-24). Microarrays may thus 
allow rapid mechanistic examination of hormones, drugs, 
elicitors, and other small molecules: moreover, functional 
analysis of transcription factors, kinases, growth factors, cyto- 
kines, receptors, and. other gene products should be possible. 
Efforts are underway to develop mRNA amplification strate- 
gies to enable probe preparation from minute tissue samples. 
This capability might allow for high-throughput patient screen- 
ing in a clinical setting. 

The current detection limit of the assay allows monitoring of 
transcripts that represent *1:500.000 (wt/wt) of the total 
mRNA. This 10-fold increase in sensitivity compared with the 
original report (4) was achieved largely by modifying the 
coupling chemistry, which reduced background fluorescence. 
The significance of this improvement is considerable in that 
approximately half the human genes identified in this study, 
including all four novel sequences, exhibited expression levels 
below the original detection limit of 1:50.000 (4). 

The ability to detect 2-fold changes in expression was 
achieved by the use of two-color fluorescence in the labeling 
and detection schemes, digitized data collection, and custom 
software. The importance of this capability is underscored by 
the fact that nearly all of the genes examined here exhibited 
<6-fold changes in expression. The four novel genes, which 
showed :£2.2-fold activation, were probably overlooked in 
previous screens that used conventional differential expression 
techniques. It may be possible to further improve the precision 
of the microarray assay by the use of closely related fluorescent 
analogs, such as Cy3 and Cy5. in the labeling and hybridization 
reactions. 

Microarrays offer a number of advantages over other po- 
tential high-capacity approaches to expression analysis. The 
chip-based approach enables small hybridization volumes, high 
array densities, and the use of fluorescence labeling and 
detection schemes. These features provide a set of perfor- 
mance specifications that are unattainable with filter-based 
approaches (25, 26). The use of cDNA clones provides hy- 
bridization specificity that is not readily attained with oligo- 
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nucleotide arrays (27-30). The parallel formal of the assav 
^ simuliancous differential expression readout for 
>1000 genes. This conirasis with sequencine-bascd methods 
which require serial data collection for cxpreiion analysis (3l' 
3.). A commercial source of cDNA microarravs would greativ 
speed the use of a chip-based approach lo expression analysis' 
The availability of large numbers of ESTs (3) provides a rich 
resource of human cDNA clones for microarraving The 
>400.000 ESTs in the public data bases represent a significant 
subset of all human genes (3. 33). Microarravs of thousands of 
ESTs w,|| provide a powerful analytical tool for future human 
gene expression studies. The -100.000 genes in the human 
genome (2. 33) emphasize the need for microarravs of greater 
uensny. Aticmpis to improve microdcposiiion techniques are 
underlay and should allow construction of arravs comainine 
a complete set of human gene targets (hrtp://cmgm^tanford 
edu/--schena/). Microarravs of -100.000 cDNA elements 
would allow expression monitoring of the entire hurhan ee- 
nome m a single hybridization. This capacity, coupled with 
detailed biochemical analysis of the individual gene producu 
would greatly speed the functional analysis of the human 
genome. 
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The Genome Project adds a new dimensbn to que^ 
on gene expression in humans and model systems. A 
;hart on page 415 summarizes progress in the 
C^norhabdnis ehgans Genome Prefect and indicates 
some ways information about sequences can be used. 



New stories, Artides. Perspectrvcs. PoJk:y Forwns. and 
Reports focus on technological developments, dinical 
applications, and ethical concerns resulting from the 
burgeoning of genomic information. [C. e/egMns kr^ 
age: F. Maduro and D. Pilgrim. University of Aiberti' 
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create trw s3e23X-:URA3 mAaton. po*ymfirase oan 
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TCTTGCTCATTTTGATATTGCTC- TGTAGATTG- 
TACTGAGAGTGCAC-3': and 5'-GCTACAAACAGC- 
GTCGACTTGAATGCCCCGACATCTTCGACTGT- 
GCGGTATTTCACACX:G-3') were used to ampWy 
me URA3 seouence ot dRS315. and tne reacDon 
troouct was transtormed mo yeast tor one-sieo gene 
reo*acerTwrn fR. Rotnster. MeffKtds Brzymol. 1M. 
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ua. pC225 is a KS* (Strataoene) piasn^ comamg 
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set of HA-taggea ateies earned on YEpaSS were oe- 
ated after repiacornont of tne pl5i Bam H(-Msc i 
fragment, to generate pi6i iajtf7-£77A), pl62 tatf7- 



The temporal, developmental, topographi- 
cal, histological, and physiological pancms 
in which a gene is expressed provide clues to 
its biological role. The large and expanding 
database of complcmentaiy DNA (cDNA) 
sequences from many organisms ( J ) presents 
the opportunity of defining these patterns at 
the level of the whole genome. 

For these studies, we used the small flow- 
ering plant Arabidopsis ihaliann as a model 
organism. Arabidopsis possesses many ad- 
vantages for gene expression analysis, in- 
cluding the fact that it has the smallest 
genome of any higher eukaryotc examined 
to date (2). Forry-five cloned Arobiiopsis 
cDN As (Table 1). including 14 complete 
sequences and 31 expressed sequence tags 
(ESTs), were used as gene-specific targets. 
We obtained the ESTs by selecting cDNA 
clones at random from an Arabidopsis 
cDNA library. Sequence analysis revealed 
that 28 of the 31 ESTs matched sequences 
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in the database (Table 1 ). Three additi nal 
cDN As from other organisms served as con- 
trob in the experimcnts. 

The 48 cDNAs» averaging -1.0 kb, 
were amplified with the polymerase chain 
reaction (PCR) and deposited into indi- 
vidual wells of a 96-well microtiter piatc. 
Each sample was duplicated in two adja* 
cent wells to allow the reproducibility of 
the arraying and hybridization process to 
be tested. Samples from the microtiter 
plate were printed onto glass microsc pc 
slides in an area measuring 3.5 mm by 5.5 
mm with the use of a high-speed arraying 
machine (3). The arrays were processed by 
chemical and heat treatment to attach the 
DNA sequences to the glass surface and 
denature them (3). TTiree arrays* printed 
in a single lot, were used for the experi- 
ments here. A single microtiter plate of 
PCR products provides sufficient material 
to print at least 500 arrays. 

Fluorescent probes were prepared from 
total AroiTidbpsis mRNA (4) by a single 
round of reverse transcription (5). The Ara- 
bidopsis mRNA was supplemented with hu- 
man acetylcholine receptor (ACWl) mRNA 
at a dilution of 1 : 10.000 (w/w) before cDNA 
synthesis, to provide an internal standard for 
calibration (5). The resulting fluorescently 



Quantitative Monitoring of Gene Expression 
Patterns with aJComplementary DNA Microarray 

Mark Schena/ Dari Shalon,*t Ronald W. Davis. 
Patrick O. Brownt 

A high-capacity system was developed to monitor the expression of many genes in 
parallel. Microanrays prepared by high-speed robotic printing of complementary Dr4As on 
glass w^ere used for quantitative expression measurements of the corresponding g nes. 
Because of the small format and high density of the anrays. hybridization volumes f 2 
microliters could be used that enabled detection of rare transcripts in probe mixtures 
derived from 2 micrograms of total cellular messenger RNA Differential expression 
measurements of AS ArBbidopsis genes were made by means of simultaneous, two-color 
fluorescence hybridization. 



vith a laser (3). A high-scnsitiviry scan gave 
.ignals that saturated the detector at nearly 
til of. the Arabidopsis target sites (Fig. lA). 
Ilalibration relative to the AChR mRNA 
itandard (Fig. lA) established a sensitivity 
imit of — 1 :50«(X>Z. No detecuble hybridiia- 
:ion W2S observed to either the rat glucocor- 
:icoid receptor (Fig. lA) or the yeast TRP4 
Fig. lA) targecs even at the highest scan- 
iing sensitivity. A moderate -sensitivity scan 
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of the same array allowed linear detection of 
the more abundant transcripts (Fig. 1 B). 
(Quantitation of both scans revealed a range 
of expression levels spanning three orders of 
magnitude for the 45 genes tested (Table 2). 
RNA blots (7) for several genes (Fig. 2) 
corroc^orated the expression leveb measured 
with the microarray to within a factor of 5 
(Table 2). 

Differential gene expression was investi- 



B Uoderate sensitivity 

1 2 3 4 5 6 7 B 9 10 11 12 

a- O-c-^r— :f— V'-^^ —.-j—Kf 



o c z -J r^d: 
C: vL- ::?'o 'o':z- iz:. 



c 3 



I O 3 



0 \ 0 i2 id 



c v:^ v: O o : g o o c*c- C C c* o ?*: o o 

■•'J . i: :v O ^ ^ h ■ ^<* o .vi -b: o 'o^ 

1:10.000 1:50.000 >1:200 1:1,000 1:10.000 

Expression level (w/w) 



Wild type 
4 5 6 7 8 9 10 11 12 
C O 



c- O O O 



D H4r4 transgenic 

1 2 3 4 5 6 7 8 9 10 11 12 
a / C C <1 O 



c O O vr- 
f 



g o a o 



0: i> ;.y ^f 



E Root tissue F Leaf tissue 

1234 5678910 11 12 123456789 10 11 12 

• • l> a::? <1 O ^ ^ 

*■ ^-^J o a .. ^5^3 

O O O K'S \t c\ c ja ;:: -^5 ^.O C 



V : V <S -> J 5 



1:1,000 1:10,000 



1. 1. Gene expresson monrtored with the use of cDNA microafrays. Ruorescent scans represented in 
HJdocolor correspond to hybrtdcation intensities. Color bars were calibrated from the signal obtained 
n the use of known concentrations of hurnan AChR mRNA in independent exoenments Numbers and 
ers on the axes mart< the position of each cDNA. (A) High-sensrtivtty fluorescein scan arter hyondization 
T fluorescein-labeled cDNA denved from wild-type plants. (B) Same array as in (A) but scanned at 
derate sensitrvtty. (C and D) A single array was probed with a l : l mixture of fluorescein- labeled cDNA 
n wiid-type plants and lissamine-labeJed cDNA from HAT4.transgenic plants. The single array was 
T scanned successively to detect the fluorescein fluorescence corresponding to mRNA from vA\6-type 
^is (C) and the lissamine fluorescence corresponding to mRNA from HAT4.uansgenic plants (D) (E 
! F) A single array was probed with a 1 :1 mixture of fluorescetn-labeted cDNA from root tissue and 
imine-labeted cDNA from leaf tissue. The single array was then scanned successively to detect the 



peed with a simultaneous, rwo-coic- h- 
bhduaiion scheme, which ser%-cd to mm,- 
muc experimental variation inherent m chf 
comparison of independent hybndizancr. 
Fluorescent pn^bcs were prepared rrom 
mPL^J A sources with the use of reverse tra: 
scnptasc in the presence of fluorescein- anc 
lissamine-labcled nucleotide analogs, re- 
spectively (5), The two probes were then 
mixed together in equal proponioru. hv- 
bridued to a single array, and scanned sep- 
arately for fluorescein and lissammc cmi$. 
sion after independent excitation of the two 
fluorophores (3). 

To test whether ove rex press ion of a sin- 
gle gene could be detected in a pool of total 
ATohidopsis mRNA» we used a microarray to 
analne a transgenic line overexprcssing the 
single transcription factor HAT4 (8), Fluo- 
rescent probes representing mRNA from 
wild-rype and HAT4.rrarugenic plants were 
labeled with fluorescein and lissamine, re- 
spectively; the two probes were then mixed 
and hybridired to a single array. An intense 
hybridiration signal was observed at the 
position of the HAT4 cDNA in the lissa- 
mine-sp>ecific scan (Fig. ID), but not in the 
fluoresce in-sp>ccific scan of the same array 
(Fig. IC). C::alibration with AChR mRNA 
added to the fluorescein and lissamine 
cDNA synthesis rcactioru at dilutions of 
1:10,000 (Fig. IC) and 1:100 (Fig. ID), 
respectively, revealed a 50-foId elevation of 
HAT4 mRNA in the transgenic line rela- 
tive to its abundance in wild-type plants 
(Table 2). This magnitude of HAT4 over- 
expression matched that inferred from the 
Northern (RNA) analysis within a factor of 
2 (Fig. 2 and Table 2). Expression of all the 
other genes monitored on the array differed, 
by less than a factor of 5 between HAT4- 
transgenic and wild-type plants (Fig 1, C 



WUd type HAT4 transgenic 



CABl 



HAT4 



ROC1 




Human 
AChR 



20 2.0 0^ 
mRNA (no) 

Fig. Z Gene expression monrtored with RNA 
(Northern) blot analysis. Designated amourtts 
mRNA from wild-type ar>d HA74-trans9eric 
plants wefe spotted onto nylon membranes and 



and D, and Tabic 2). Hybridization of flu- 
orcscein-labcled glucocorticoid receptor 
cDNA (Fig. IC) and lissaminc-labclcd 
TRP4 cDNA (Fig- ID) verified the pres- 
ence of the negative concrol targets and the 
lack of optical cross talk between the two 
fluorophores. 

To explore a more complex alteration in 
expression panems, we performed a second 
nvo-color hybridization experiment with 
fluorescein- and lissamine-labelcd probes 
prepared from root and leaf mRNA, respec- 
tively. The scanning sensitivities for the 
nvo fluorophores were normalized by 
matching the signals resulting from AChR 



mRNA, which was added to both cDNA 
synthesis reactions at a dilution of 1:1000 
(Fig. 1 . E and F). A comparison of the scans 
revealed widespread differences in gene ex- 
pression between root and leaf tissue (Fig. 1. 
E and F). The mRNA from the Iight-regu- 
latcd CABl gene was -5CX)-fold more abun- 
dant in leaf (Fig. IF) than in root tissue 
(Fig. IE). The expression of 26 other genes 
differed between root and leaf tissue by 
more than a factor of 5 (Fig- 1. E and F). 

The HAT4- transgenic line we examined 
has elongated hypocotyls, early flowering, 
poor germination, and altered pigmentation 
(8). Although changes in expression were 



observed for HAT4. large changes in ex- 
pression were not obscn-ed for anv oi rhe 
other 44 genes we examined. Tnis u-a> 
somewhat surprising, particularly because 
comparative analysis of leaf and root tissue 
identified 27 differentially expressed genes. 
Analysis of an expanded set of genes mav be 
required to identify genes whose expression 
changes upon RAT4 overcxpression; alter- 
natively, a comparison of mRNA popula- 
tions from specific tissues of wild-type and 
HAT4 -transgenic plants may allow identi- 
fication of downstream genes. 

At the current density of robotic pnnting» 
it is feasible to scale up the fabrication pro-, 
cess to produce arrays containing 20,000 
cDNA targets. At this densiry*. a single array 
would be sufficient to provide gene-specific 
targets encompassing nearly the entire rep- 
enoire of expressed genes in the Arabidopsis 
genome (2). The availability of 20,274 ESTs 
from Arabidopsis {1,9) would provide a rich 
source of templates for such studies. 

The estimated 100,000 genes in the hu- 
man genome (10) exceeds the number of 
Aroijidopsis genes by a factor of 5 (2). This 
modest increase in complexity suggests that 
similar cDNA microairays, prepared from 
the rapidly growing repertoire of human 
ESTs ( 1 ). could be used to determine the 
expression patterns of tens of thousands of 
human genes in diverse cell types. Coupling 
an amplification strategy to the rcvcne 
transcription reaction (J J) could make it 
feasible to monitor expression even in 
minute tissue samples. A wide variety f 
acute and chronic physiological and patho- 
logical conditions might lead to character- 
istic changes in the panems of gene expres- 
sion in peripheral blood cells or other easily 
sampled tissues. In concert with cDNA mi- 
croarrays for monitoring complex expres- 
sion patterns, these tissues might therefore 
serve as sensitive in vivo sensors for clinical 
diagnosis. Microarrays of cDNAs could thus 
provide a useful link between human gene 
sequences and clinical medicine. 

Table 2. Gene expression monitoring by microer- 
ray and RNA blot analyses; tg, HAT4-trBnsgeric. 
See TaWe 1 for additional gen© inforniatoi. Ex- 
pression levels (w/w) were calibrated wfth the use 
of known amounts of human AChR mRNA. V^ues 
for the microarray were determined from microar- 
ray scans (Fig. 1); values for the RNA btot were 
determined from RNA blots (Rg. 2). 



Expression level (w/w) 

Gene • 

Microarray RNA Wot 



CABl 1:48 1:83 

CA8/(tg) 1:120 1:150 

HAT4 1:8300 1:8300 

(tg) 1:150 1:210 

R0C1 M200 



Table 1. Sequences contained on the cDNA microarray. Shown is the position, the krwwn or putative 
function, and the accession number of each cDNA in the rrucroarray Ftg. 1 ). AU but three of the ESTs used 
in this study nnatched a sequence in the database. NADH. reduced form of nicotinamide adenine 
dinucieotide; ATPase. adenosine triphosphatase: GTP. guarosine tnphospnate. 



Position 



cDNA 



Function 



Accession 
number 



a1.2 


AChR 


a3. 4 


EST3 


85.6 


EST6 


87.8 


AAC1 


89. 10 


EST12 


all, 12 


EST13 


bl.2 


CABt 


b3. 4 


EST17 


b5.6 


GA4 


b7. 8 


EST19 


b9. 10 




b11. 12 


EST23 


cl.2 


EST29 


c3. 4 


GBF-2 


c5. 6 


EST34 


c7. 8 


EST35 


c9. 10 


EST41 


C11. 12 


rGR 


d1.2 


EST42 


d3. 4 


EST45 


d5. 6 


HAT1 


d7. 8 


EST46 


d9. 10 


EST49 


dll. 12 


HAT2 


81.2 


HAT4 


e3. 4 


EST50 


85.6 


HATS 


87. B 


EST51 


89. 10 


HAT22 


ell. 12 


EST52 


fl.2 


EST59 


f3. 4 


KNATI 


f5. 5 


EST60 


f7.e 


EST69 


f9. 10 


PPH1 


fn. 12 


EST70 


91.2 


EST75 


g3. 4 


EST78 


95.6 


ROC1 


g7.8 


EST82 


99.10 


ESTB3 


gii, 12 


EST84 


hi. 2 


EST91 


h3. 4 


EST96 


h5. 6 


SARI 


h7. 8 


EST100 


h9. 10 


EST103 


nil, 12 


7PP4 



Human AChR 
Actin 

NADH dehydrogenase 

Actin 1 

Unkrown 

Actin 

Chiorophyil a/b bfr»dir>g 
Phosphoglycerate kinase 
Gibberellic acid biosynthesis 
tJnknown 

G-box Iwxjirig factor 1 
Elongaiion factor 
Aldolase 

G-box bindir>g factor 2 
Chloroplast protease 
Unkrx3wn 
Catalase 

Rat glucocorticoid receptor 

Unknown 

ATPase 

Homeobox-leucine zipper 1 
Light harvesting complex 
Unknown 

Homeobox-leucine zipper 2 
Homeobox-leuctr>e zipper 4 
Phosphortbulokinase 
Homeobox-leucine zipper 5 
UnkrK?wn 

Homeobox-leucine zipper 22 
Oxygen evolving 
Unknown 

Knofted-like horrteobox 1 
RuBisCO small subunit 
Translation elongation faaor 
Protein phosphatase 1 
Unknown 

Chloroptast protease 

Unknown 

Cydophilin 

GTP binding 

Ur>krown 

Unkrxjwn 

Unknown 

Unkrx>wn 

Syr^tobrevin 

Ught harvestirtg connptex 

Ught harvesting conriplex 

Yeast frvotoqhan biosvnt hssis 



H36236 

Z27010 

M20016 

U35594t 

T45783 

M85150 

T44490 

L37126 

U35595t 

X63894 

X52256 

T04477 

X63895 

R87034 

T14152 

T22720 

Ml 4053 

U36596t 

J04185 

U09332 

T04063 

T 76267 

U09335 

M90394 

T04344 

M90416 

Z33575 

U09335 

T21749 

234507 

U14174 

XI 4564 

T42799 

U34803 

T44621 

T43598 

R55481 

LI 4844 

XS9152 

233795 

T45278 

T13832 

R54816 

M90418 

218205 

X03909 
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Gene Therapy in Peripheral Blood 
Lymphocytes and Bone Marrow for 
ADA Immunodeficient Patients 

Clautdio Bordignon/ Luigi D. Notarangelo, Nadia Nobili, 
Giuliana Ferrari. Giuiia Casorati. Paola Panina, Evelina Mazzolari, 
Daniela Maggioni, Claudia Rossi. Paolo Servida, 
Alberto G. Ugazio, Fulvio Mavilio 

Adenosine deaminase (ADA) deficiency results in severe combined immunodeficiency 
the first genetic disorder treated by gene therapy. Two different retroviral vectors were 
used to transfer ex vivo the human ADA minigene into bone marrow cells and peripheral 
blood lymphocytes from two patients undergoing exogenous enzyme replacement ther- 
apy. After 2 years of treatment, long-temn survival of T and B lymphocytes, marrow ceils 
and granulocytes expressing the transfen^ed ADA gene was demonstrated and resulted 
in normalization of the immune repertoire and restoration of cellular and humoral immunity. 
After discontinuation of treatment. T lymphocytes, derived from transduced peripheral 
blood lymphocytes,- were progressively replaced by marrow-derived T cells in both pa- 
tients. These results indicate successful gene transfer into long-lasting progenitor cells, 
producing a functional multilineage progeny. 



Severe combined immunodeficiency asso- 
ciated with inherited deficiency of ADA 
(/) is usually fatal unless affected children 
are kept in protective isolation or the im- 
mune system is reconstituted by bone mar- 
row transplantation from a human leuko- 
cyte antigen (HLA)-identical sibling donor 
(2). This is the therapy of choice, although 
it is available only for a minority of patients. 
In recent years, other forms of therapy have 
been developed, including transplants from 
haploidentical donors (3, exogenous en- 
lymc replacement (5). and somatic-cell 
gene therapy (6-9). 

We previously reponed a preclinical mod- 
el in which ADA gene transfer and expression 
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successfully restored immune functions in hu- 
man ADA -deficient (ADA* ) ' peripheral 
blood lymphocytes (PBLs) in immunodefi- 
cient mice in vivo (iO. J]). On the basis of 
these preclinical results, the clinical applica- 
tion of gene therapy for the neatment of 
ADA" SCID (severe combined immunodefi- 
ciency disease) patients who previously failed 
exogenous eniyme replacement therapy was 
approved by our Ir\stitutional Ethical Com- 
minecs and by the Italian National Commit- 
tee for Biocthics (i2). In addition to evaluat- 
ing the safety and efficacy of the gene therapy 
procedure, the aim of the study was t define 
the relative role of PBLs and hematopoietic 
stem cells in the long-term reconstitution of 
immune functions after retroviral vector-me- 
diated ADA gene transfer. For this purpose, 
two structurally identical vectors expressing 
the human ADA complementary DNA 
(cDNA), distinguishable by the presence of 
alternative restriction sites in a nonfunctional 
region of the viral lor^-termiiul repeat 
(LTR), were used to transduce PBLs and bone 



